Automated Focus Extraction for Question Answering over Topic Maps
-
Upload
tmra -
Category
Technology
-
view
585 -
download
1
description
Transcript of Automated Focus Extraction for Question Answering over Topic Maps
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
Automated Focus Extraction for Question Answering over Topic Maps
Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
2
Context: domain portable Question Answering over Topic Maps
•Partly funded by the Flemish government as part of the ITEA2 project LINDO (ITEA2-06011)
•The research towards portable domain question answering over
Topic Maps is done within the Belgian part of the LINDO project.
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
3
• Space industry needs a solution to the knowledge retention problem.
• More structured than mind maps, less formal than
RDF/OWL.
• Allows to organize information in an ontological view.
• An ISO standard.
Why Topic Maps?
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
4
Who is the composer of La Bohème?
� Puccini
Why Topic Maps?
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
5
LINDO-BE General Architecture
Time Exp.
Extractor
Focus
ExtractorGraph
ReducerAnchorer
Topic Map Engine
QuestionAnswerAnswer
Extractor
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
6
Question
LINDO-BE General Architecture
Time Exp.
Extractor
Graph
ReducerAnchorer
Topic Map Engine
AnswerAnswer
Extractor
Focus
Extractor
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
7
Question FocusFocus is the type of the answer in the question terminology
Who is the composer of La Bohème?
� Puccini
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
8
Focus
Asking Point (AP) Expected Answer Type (EAT)
HUMAN: “Who wrote the libretto for La Tilda?”“Who is the librettist of La Tilda?”
(explicit) (implicit)
EAT Classes: TIME,
NUMERIC,
DEFINITION,
LOCATION,
HUMAN,
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
9
• Where was Puccini born?
• What is Puccini's place of birth?
• What is Puccini's birthplace?
• What is the birth place of Puccini?
• What city was Puccini born in?
• What place was Puccini born in?
• Where is Puccini from?
Is it difficult to find the focus?
Puccini
Lucca
born
in
pers
on
plac
e
City
is a
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
10
Why AP should take precedence over EAT?
“Who is the librettist of La Tilda?”
EAT = HUMAN � Person
AP = Librettist
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
11
Precision and Recall
|}{|
|}{}{|
retrieved
retrievedrelevantP
I=
|}{|
|}{}{|
relevant
retrievedrelevantR
I=
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
12
Why AP should take precedence over EAT?
“Who is the librettist of La Tilda?”
EAT = HUMAN � Person
AP = Librettist
PAP = 57/57 =1
PEAT = 57/1165 =0.049
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
13
Why AP should take precedence over EAT?
0.210.089EAT
0.300.311AP
RecallPrecisionName
Results over 100 annotated questions:
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
14
Focus Branching
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
15
Focus Extractor Architecture • Supervised machine learning based on the
principal of maximum entropy (Maxent).
• 2100 questions have been annotated:
• 1500 from Li & Roth corpus
• 500 from TREC-10
• 100 asked over the Italian Opera topic map
• The corpus was split into 80% of training and 20% testing. The evaluation was done 10 times, each time shuffling the training and test data.
Syntactic
Parser
POS
Tagger
Question FocusFocus
ExtractorTokenizer
Lexical
Analysis
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
16
Asking Point Expected Answer Type
O: What
AP: operaO: did
O: Puccini
O: writeO: ?
AP classifier
HUMAN: Who is PucciniDEFINITION: What is Tosca?
LOCATION: Where did Dante die?
TIME: When did Puccini die?NUMERIC: How many characters have
been killed by poisoning?OTHER: What did Heinrich Heine write?
EAT classifier
Questions Annotation
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
17
Class Precision Recall F-Score
AskingPoint 0.854 0.734 0.789
Other 0.973 0.987 0.980
AP Results
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
18
Class Precision Recall F-Score
DEFINITION 0.887 0.800 0.841
LOCATION 0.834 0.812 0.821
HUMAN 0.904 0.753 0.820
TIME 0.880 0.802 0.838
NUMERIC 0.943 0.782 0.854
OTHER 0.746 0.893 0.812
EAT Results
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
19
Overall Results
Value Std dev Std err
Focus (AP+EAT) 0.827 0.020 0.006
The overall results are provided as the accuracy of the classifier.
Accuracy = correct instances / overall instances
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
20
Prediction of Accuracy
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
21
Conclusions
• We achieved 82.7% accuracy for focus extraction.• The specificity of the focus degrades gracefully (we first try
to extract the AP, and fall back to the EAT).
• The focus is identified dynamically instead of relying on static taxonomy of question types.
• Machine learning techniques were used throughout the application stack.
• The results could be improved with more training data.• The whole setting is domain independent.
TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps
22
Thank you
Questions?