IRSAW - Towards Semantic Annotation for Question Answering

30
IRSAW – Towards Semantic Annotation of Documents for Question Answering Johannes Leveling CNI Spring 2007 Task Force Meeting, Apr. 16–17, 2007, Phoenix, AZ IRSAW: Intelligent Information Retrieval on the Basis of a Semantically Annotated Web funded by the DFG (Deutsche Forschungsgemeinschaft)

description

Talk given at the CNI spring 2007 task force meeting, April 16-17, Phoenix, AZ

Transcript of IRSAW - Towards Semantic Annotation for Question Answering

Page 1: IRSAW - Towards Semantic Annotation for Question Answering

IRSAW – Towards SemanticAnnotation of Documents for Question

Answering

Johannes Leveling

CNI Spring 2007 Task Force Meeting,Apr. 16–17, 2007, Phoenix, AZ

IRSAW: Intelligent Information Retrieval on the Basis of aSemantically Annotated Webfunded by the DFG (Deutsche Forschungsgemeinschaft)

Page 2: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Outline

1 Introduction

2 IRSAWBasic ArchitectureThe MultiNet paradigmProcessing PhasesModules

3 Results

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 2 / 30

Page 3: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Motivation

More information becomes available on the internetand/but precise answers are difficult to find→ Develop a semantically based question answering

(QA) framework in the IRSAW projectGeneral idea:

Retrieve documents from the internetSemantically analyze and annotate the question anddocuments, andApply deep linguistic methods for question answeringon document content

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 3 / 30

Page 4: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Background and Embedding in a GeneralStrategy

Semantically based natural language processingKnowledge representation MultiNet (conceptoriented)Support by large semantically oriented computationallexicon→ word sense disambiguationHomogeneous representation of lexical knowledge,general background knowledge (world knowledge),dialogue context, and meaning of sentences andtextsEmphasis on semantic relations in all applications(not just concepts or descriptors)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 4 / 30

Page 5: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

NLI-Z39.50: Beyond Descriptor Search

Predecessor: Natural language interface for the Z39.50protocol

Natural language interface to libraries andinformation providers on the internetTransformation of semantic structures intoexpressions of formal retrieval languagesIncludes features such as de-duplication of results,phonetic search, decomposition of compounds, queryexpansion with additional conceptsExample query: Where do I find books by PeterJackson which were published in the last ten yearswith Springer and Addison-Wesley?

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 5 / 30

Page 6: IRSAW - Towards Semantic Annotation for Question Answering
Page 7: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

IRSAW

Multi staged approachQuestions are processed in three phases, accessingweb search engines, local databases, and a semanticnetwork knowledge baseWeb documents→ semantic annotation

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 7 / 30

Page 8: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

Architecture of IRSAW

Document preprocessing

Question processing Find answer candidates

Natural language question

Answer

IRSAW framework

MAVE

InSicht

QAP

MIRA

Select answer candidate

Merge answer streams

Validate answer candidates

Score and rank answer candidates

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 8 / 30

Page 9: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

IRSAW – Methods and Modules (1/3)

Apply parser (for German language) to produce asemantic network representation of texts (based onthe knowledge representation paradigm MultiNet)

→ Allows a full semantic interpretation of questions anddocuments on which logical inferences are based(state-of-the-art: mostly shallow methods)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 9 / 30

Page 10: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

IRSAW – Methods and Modules (2/3)

Combine different data streams containing answercandidates

→ Use different methods to produce answer streams toincrease recall and robustnessLogically validate answers

→ Select validated answers from streams of answercandidates to increase precision

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 10 / 30

Page 11: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

IRSAW – Methods and Modules (3/3)

Natural language generation of answers→ Allows for rephrasing from text and combination of

answer fragments from different documents(state-of-the-art: extracting snippets from the text)IRSAW also aims at investigating linguisticphenomena in questions and documents (e.g.idioms, metonymy, and temporal and spatial aspects)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 11 / 30

Page 12: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

IRSAW – Software

The IRSAW project will result in two software componentsaccessible via internet:

1 The question answering system IRSAW2 A web service for the semantic annotation of (web)

documents

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 12 / 30

Page 13: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

MultiNet: Example for Semantic Network

In which year did Charles de Gaulle die?/In welchem Jahr starb Charles de Gaulle?

c6dn starb

SUBS sterbenTEMP past.0

[gener sp]

AFF //

TEMP

��

c31d de Gaulle

SUB Menschfact real

gener sp

quant one

refer det

card 1

etype 0

varia con

ATTR

��

ATTR//

c33naSUB Nachname[

gener sp

quant one

card 1

etype 0

]

VAL

��

c5?wh-questionme∨oa∨ta Jahr

SUB Jahrfact real

gener sp

quant one

refer det

card 1

etype 0

c32na

SUB Vorname[gener sp

quant one

card 1

etype 0

]

VAL

��

De_Gaulle.0fe

Charles.0fe

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 13 / 30

Page 14: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

MultiNet: Tools and Resources

Syntactic-semantic parser:WOCADI (Word Class Controlled DisambiguatingParser)Large semantic computational lexicon:HaGenLex (Hagen German Lexicon)Workbench for the computer lexicographer:LiaPlus (Lexicon in action)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 14 / 30

Page 15: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

IRSAW: First Processing Phase

Transform user question into IR queryPreselect information resources (→ Broker)Send IR query to web search engines and webportals (→ lists of URLs)Retrieve web documents referenced and convertthem

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 15 / 30

Page 16: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

IRSAW: Second Processing Phase

Segment and index text passages from the web inlocal databaseAccess to units of textual information of certain types(chapters, paragraphs, sentences, or phrases)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 16 / 30

Page 17: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

IRSAW: Third Processing Phase

Employ different modules to produce data streamscontaining answer candidates:

QAP (Question Answering by Pattern matching),MIRA (Modified Information Retrieval Approach), andInSicht

Merge, rank, logically validate answer candidates andselect best answer (MAVE)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 17 / 30

Page 18: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

InSicht

Analyze text segments (question, texts) withWOCADI and return the representation of themeaning of a text as a semantic networkExpand queries with semantically related concepts

→ High recallParaphrase answer node in semantic network(generate answer)Match semantic networks

→ High precision+ Co-reference resolution, logical inference

rules/textual entailments

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 18 / 30

Page 19: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

InSicht Logical Entailment

( (rule((subs ?n1 “ermorden.1.1”) ;; kill(aff ?n1 ?n2)->(subs ?n3 “sterben.1.1”) ;; die(aff ?n3 ?n2)) )

(ktype categ)(name “ermorden.1.1_entailment”) )

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 19 / 30

Page 20: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

InSicht Example

User question: In which year did Charles de Gaulledie?In welchem Jahr starb Charles de Gaulle?Text passage: France’s chief of state Jacques Chiracacknowledged the merits of general and statesmanCharles de Gaulle, who died 25 years ago.Frankreichs Staatschef Jacques Chirac hat dieVerdienste des vor 25 Jahren gestorbenen Generalsund Staatsmannes Charles de Gaulle gewürdigt.(SDA.951109.0236)Answer: 1970 (deictic temporal expression resolved;document written in 1995)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 20 / 30

Page 21: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

QAP - Question Answering by PatternMatching

Create patterns by processing knownquestion-answer pairsSearch for text passages containing keywords fromquestionApply pattern matching on answer candidatesExtract answer string from variable binding

+ Robustness, high precision for a small class ofquestions

– No logical inferences possible

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 21 / 30

Page 22: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

QAP Example

User question: In which year was the RussianRevolution?In welchem Jahr fand die russische Revolution statt?Text passage: The satire inspired by the Russianrevolution 1917 lets the dream of liberty and equalityfail because of humans.Die von der Russischen Revolution 1917 inspirierteSatire läßt den Traum von Freiheit und Gleichheit anden Menschen scheitern. (FR940612-000533)Answer: 1917 (pattern matching subsystem ignoresmetonymy and ellipsis)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 22 / 30

Page 23: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

MIRA - Modified Information RetrievalApproach

Train tagger on answer classes (LOC, PER, ORG)Search for text passages containing keywords fromquestionUse tagger on answer candidate sentence and selectmost frequent word sequence

+ Highly recall-oriented– Very low precision, works only for a small class of

questions (with answer type LOC, PER, ORG)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 23 / 30

Page 24: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

MIRA Example

User question: Who was the first man on the moon?Wer war der erste Mensch auf dem Mond?Text passage: Twenty-five years ago Neil Armstrongwas the first man to step onto the moon, but todaymanned space flight stagnates.Vor 25 Jahren betrat Neil Armstrong als ersterMensch den Mond, doch heute stagniert diebemannte Raumfahrt. (FR940724-001243)Answer: Neil Armstrong (PER)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 24 / 30

Page 25: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Basic ArchitectureThe MultiNet paradigmProcessing PhasesModules

MAVE - MultiNet-based Answer Verification

Validate answer candidatesTest logical validity of answer candidate (usinginferences, entailments)Added heuristic quality indicators as fallback strategySelect most trusted answer

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 25 / 30

Page 26: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Evaluations

InSicht evaluation: best performance for monolingualGerman question answering task at Cross LanguageEvaluation Forum 2005 (QA@CLEF 2005)IRSAW evaluation at QA@CLEF 2006: combinationof InSicht and QAP answer stream:one of the best results in the monolingual GermanQA track;best results for answer validation task with MAVEIRSAW evaluation (for RIAO 2007): InSicht, QAP,MIRA answer streams, and logical validation withMAVE→ better results with more answer streamsand logical answer validation

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 26 / 30

Page 27: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Evaluation Results

Results for answer validation of answer candidates for600 questions (InSicht:I, MIRA:M, QAP:Q; c=correct,i=inexact, w=wrong)

QA streams c i w

IRSAW: I 199.4 10.9 15.7IRSAW: I+M+Q 244.4 16.9 255.7IRSAW: I+M+Q (Optimum) 290.0 15.0 215.0

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 27 / 30

Page 28: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Project Status

Implementation of modules for QA system IRSAWcompletedEvaluations are now based on corpus of newspaperarticles / Wikipedia (corpora of millions of sentences)Semantic annotation⇒ Semantic Web

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 28 / 30

Page 29: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Future Work

Next: Evaluation on web pages (even bigger corpus,dynamic content)Add more robustnessAdd acoustic interface (speech input)Create English prototype (?)

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 29 / 30

Page 30: IRSAW - Towards Semantic Annotation for Question Answering

IntroductionIRSAWResults

Selected References

Ingo Glöckner, Sven Hartrumpf, and Johannes Leveling. Logicalvalidation, answer merging and witness selection – a case studyin multi-stream question answering. To appear in: Proceedingsof RIAO 2007, 2007.

Sven Hartrumpf and Johannes Leveling. University of Hagen atQA@CLEF 2006: Interpretation and normalization of temporalexpressions. In Alessandro Nardi, Carol Peters, and José LuisVicedo, editors, Results of the CLEF 2006 Cross-LanguageSystem Evaluation Campaign, Working Notes for the CLEF2006 Workshop, Alicante, Spain, 2006.

Hermann Helbig. Knowledge Representation and theSemantics of Natural Language. Springer, Berlin, 2006.

Johannes Leveling IRSAW – Towards Semantic Annotation of Documents for QA 30 / 30