Automated Focus Extraction for Question Answering over Topic Maps

22
TMRA’09, Leipzig Automated Focus Extraction for Question Answering over Topic Maps Automated Focus Extraction for Question Answering over Topic Maps Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas

description

This paper describes the first stage of question analysis in Question Answering over Topic Maps. It introduces the concepts of asking point and expected answer type as variations of the question focus. We identify the question focus in questions asked to a Question Answering system over Topic Maps. We use known machine learning techniques for expected answer type extraction and implement a novel approach to the asking point extraction. We also provide a mathematical model to predict the performance of the system.

Transcript of Automated Focus Extraction for Question Answering over Topic Maps

Page 1: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

Automated Focus Extraction for Question Answering over Topic Maps

Rani Pinchuk, Alexander Mikhailian and Tiphaine Dalmas

Page 2: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

2

Context: domain portable Question Answering over Topic Maps

•Partly funded by the Flemish government as part of the ITEA2 project LINDO (ITEA2-06011)

•The research towards portable domain question answering over

Topic Maps is done within the Belgian part of the LINDO project.

Page 3: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

3

• Space industry needs a solution to the knowledge retention problem.

• More structured than mind maps, less formal than

RDF/OWL.

• Allows to organize information in an ontological view.

• An ISO standard.

Why Topic Maps?

Page 4: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

4

Who is the composer of La Bohème?

� Puccini

Why Topic Maps?

Page 5: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

5

LINDO-BE General Architecture

Time Exp.

Extractor

Focus

ExtractorGraph

ReducerAnchorer

Topic Map Engine

QuestionAnswerAnswer

Extractor

Page 6: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

6

Question

LINDO-BE General Architecture

Time Exp.

Extractor

Graph

ReducerAnchorer

Topic Map Engine

AnswerAnswer

Extractor

Focus

Extractor

Page 7: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

7

Question FocusFocus is the type of the answer in the question terminology

Who is the composer of La Bohème?

� Puccini

Page 8: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

8

Focus

Asking Point (AP) Expected Answer Type (EAT)

HUMAN: “Who wrote the libretto for La Tilda?”“Who is the librettist of La Tilda?”

(explicit) (implicit)

EAT Classes: TIME,

NUMERIC,

DEFINITION,

LOCATION,

HUMAN,

Page 9: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

9

• Where was Puccini born?

• What is Puccini's place of birth?

• What is Puccini's birthplace?

• What is the birth place of Puccini?

• What city was Puccini born in?

• What place was Puccini born in?

• Where is Puccini from?

Is it difficult to find the focus?

Puccini

Lucca

born

in

pers

on

plac

e

City

is a

Page 10: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

10

Why AP should take precedence over EAT?

“Who is the librettist of La Tilda?”

EAT = HUMAN � Person

AP = Librettist

Page 11: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

11

Precision and Recall

|}{|

|}{}{|

retrieved

retrievedrelevantP

I=

|}{|

|}{}{|

relevant

retrievedrelevantR

I=

Page 12: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

12

Why AP should take precedence over EAT?

“Who is the librettist of La Tilda?”

EAT = HUMAN � Person

AP = Librettist

PAP = 57/57 =1

PEAT = 57/1165 =0.049

Page 13: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

13

Why AP should take precedence over EAT?

0.210.089EAT

0.300.311AP

RecallPrecisionName

Results over 100 annotated questions:

Page 14: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

14

Focus Branching

Page 15: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

15

Focus Extractor Architecture • Supervised machine learning based on the

principal of maximum entropy (Maxent).

• 2100 questions have been annotated:

• 1500 from Li & Roth corpus

• 500 from TREC-10

• 100 asked over the Italian Opera topic map

• The corpus was split into 80% of training and 20% testing. The evaluation was done 10 times, each time shuffling the training and test data.

Syntactic

Parser

POS

Tagger

Question FocusFocus

ExtractorTokenizer

Lexical

Analysis

Page 16: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

16

Asking Point Expected Answer Type

O: What

AP: operaO: did

O: Puccini

O: writeO: ?

AP classifier

HUMAN: Who is PucciniDEFINITION: What is Tosca?

LOCATION: Where did Dante die?

TIME: When did Puccini die?NUMERIC: How many characters have

been killed by poisoning?OTHER: What did Heinrich Heine write?

EAT classifier

Questions Annotation

Page 17: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

17

Class Precision Recall F-Score

AskingPoint 0.854 0.734 0.789

Other 0.973 0.987 0.980

AP Results

Page 18: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

18

Class Precision Recall F-Score

DEFINITION 0.887 0.800 0.841

LOCATION 0.834 0.812 0.821

HUMAN 0.904 0.753 0.820

TIME 0.880 0.802 0.838

NUMERIC 0.943 0.782 0.854

OTHER 0.746 0.893 0.812

EAT Results

Page 19: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

19

Overall Results

Value Std dev Std err

Focus (AP+EAT) 0.827 0.020 0.006

The overall results are provided as the accuracy of the classifier.

Accuracy = correct instances / overall instances

Page 20: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

20

Prediction of Accuracy

Page 21: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

21

Conclusions

• We achieved 82.7% accuracy for focus extraction.• The specificity of the focus degrades gracefully (we first try

to extract the AP, and fall back to the EAT).

• The focus is identified dynamically instead of relying on static taxonomy of question types.

• Machine learning techniques were used throughout the application stack.

• The results could be improved with more training data.• The whole setting is domain independent.

Page 22: Automated Focus Extraction for Question Answering over Topic Maps

TMRA’09, LeipzigAutomated Focus Extraction for Question Answering over Topic Maps

22

Thank you

Questions?