Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba...

26
Lymba (QA System) Fabian Schillinger [email protected] 15.01.2014 University of Freiburg Chair of Algorithms and Data Structures Seminar Information Extraction

Transcript of Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba...

Page 1: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba (QA System)

Fabian [email protected]

15.01.2014

University of FreiburgChair of Algorithms and Data Structures

Seminar Information Extraction

Page 2: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 2/26

TREC 2007

● Question anwering track● Blog data & Newswire documents● Factoid questions

"How many calories are there in a Big Mac?"● List questions

"List the names of chewing gums."● "Other" questions

interesting facts about some target

Page 3: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 3/26

PowerAnswer 4 Overview

● Distributed strategy-based QA-System● Strategy is collection of the components

● Question Processing (QP)● Passage Retrieval (PR)● Answer Processing (AP)

Page 4: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 4/26

PowerAnswer 4 Overview

● determine temporal constraints

● detect the expected answer type

● select the keywords used in retrieving relevant passages

● decide which question class to use

Page 5: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 5/26

PowerAnswer 4 Overview

● ranks passages that are retrieved by the IR system

Page 6: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 6/26

PowerAnswer 4 Overview

● extracts and scores the candidate answers

Page 7: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 7/26

Question Answering over Blog Data

● 175 GB of blog data● with surrounding HTML/XML

● parsed to identify unique content● language detection to remove non-english text,

spam and empty entries● 13.1 GB of data (92.5% reduction)

Page 8: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 8/26

Temporal Event Processing

● Concept Tagger Module● Detects events in question or candidate passage● Labels them

Event Class Question

Occurrencemarry Who is he planning to marry?

Stateheld In what city were … held?

Aspectualbegin On what date did the court begin?

Page 9: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 9/26

Temporal Event Processing

● Concept Tagger Module● Detects events in question or candidate passage● Labels them● Identifies temporal expressions

– Absolute dates– Times– Durations

Expression Question

Absolute Date2004 What company acquired IMG in 2004?

DurationThree months In three months following ...

SetsEach year How many grants … each year?

Page 10: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 10/26

Temporal Event Processing

● Concept Tagger Module● Uses set of rules working on full parse tree of text● All temporal expressions normalized

Q249.5: How many grants does the Fulbright Program award each year?

P: The program named after the former Senator J. William Fulbright awards approximately 4,500 new grants annually.

Page 11: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 11/26

ROSE – a new NER System

● Uses 3-step process:● Pass text through pattern based grammar system● Pass grammar annotated data through ML system● Perform partial matching on the text

Page 12: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 12/26

Answer Likelihood for Factoid Answer Selection

● Goal● Group questions from previous TRECs into classes● Build language model for each class on features

extracted from Question and Answer

Page 13: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 13/26

Answer Likelihood for Factoid Answer Selection

● Three methods● Generate REGEX-style paraphrases and group

them together by paraphrase identifiers● Use hierarchical clustering based on

– Expected answer type– Most relevant keywords– Named entity types

● Group by answer type

Page 14: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 14/26

Answer Likelihood for Factoid Answer Selection

● For questions in classes and correctly judged answers the following features were extracted● Stemmed keywords● Morphological alternations for keywords● Named entity tags

Page 15: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 15/26

Answer Likelihood for Factoid Answer Selection

● Implemented in two stages:● During question processing● During answer processing

● Use score of answer likelihood to re-rank candidates

● Best observations with grouping the questions by answer type

Page 16: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 16/26

List Questions

● Strategy● Try to maximize recall by returning as many

answers as possible during passage retrieval using– Lexico-semantic alternations– Relaxing the query to include

● Target keywords● Most relevant keywords from primary question text

● But how to filter answers?

Page 17: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 17/26

List Questions

● Strategy● Try to maximize recall by returning as many

answers as possible during passage retrieval using– Lexico-semantic alternations– Relaxing the query to include

● Target keywords● Most relevant keywords from primary question text

● But how to filter answers?– Use lists from Wikipedia– Integrate COGEX

Page 18: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 18/26

List Questions

● external data for specialized answer types● Bots for Amazon.com, imdb.com if question was in

domain of books, songs or movies● Bot for Wikipedia

– Google "I'm feeling lucky" to locate relevant articles

Page 19: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 19/26

List Questions

● COGEX for list questions● Potential candidates from passage retrieval● Each candidate is hypothesized to be answer● COGEX checks if assertion is entailed by

corresponding candidate answer passage● Only candidates with entailment score over some

threshold are returned as valid answers

Page 20: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 20/26

"Other" Questions

● The challenge is selection of interesting and novel nuggets from large corpus● Definition pattern matching module● List of over 200 positive and negative pre-computed

patterns

Extended by● Hierarchy of nugget patterns and automatically

derived generic answer patterns

Page 21: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 21/26

"Other" Questions

● Nugget hierarchy based on question classes from previous TREC question sets● 35 target classes

– Animal, actor, musician, literature...● Each class is associated with a set of minimal

information– Person pattern: full name, birth, death, place of birth,

residence, occupation, etc.– Event: begin time, end time, duration, location,

participants, etc.

Page 22: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 22/26

"Other" Questions

● Example patterns:● _nationality _profession _var

German chancellor Angela Merkel● _var ( _nationality _profession

Angela Merkel (Germany's chancellor● _nationality _JJ _profession _var

Germany's first female chancellor Angela Merkel

Page 23: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 23/26

Results

● Factoid answer selectionRun Tag Submitter Accuracy

LymbaPA07 Lymba Corporation 0.706

LCCFerret Language Computer Corporation 0.494

lsv2007c Saarland University 0.289

Page 24: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 24/26

Results

● List questionsRun Tag Submitter F-Score

LymbaPA07 Lymba Corporation 0.479

LCCFerret Language Computer Corporation 0.324

ILQUA1 State University of New York (SUNY) at Albany

0.147

Page 25: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 25/26

Results

● "other" questionsRun Tag Submitter F-Score (beta=3)

FDUQAT16B Fudan University 0.329

lsv2007c Saarland University 0.299

QASCU2 Concordia University 0.281

LymbaPA07 Lymba Corporation 0.281

LCCFerret Language Computer Corporation 0.261

Page 26: Lymba (QA System)ad-teaching.informatik.uni-freiburg.de/information-extraction-ws1314… · Lymba – Fabian Schillinger 11/26 ROSE – a new NER System Uses 3-step process: Pass

Lymba – Fabian Schillinger 26/26

Thank you for your attention.

Any questions?