Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer...

39
Strategies for Advanced Question Answering Sanda Harabagiu & Finley Laca tusu Language Computer Corporation HLT-NAACL2004 Workshop

description

Introduction Our fundamental premise is that progress in Q/A cannot be achieved only by enhancing the processing components, but it also requires generating the best strategies for processing each individual question. Thus we believe that Q/A systems capable of successfully processing complex questions should employ multiple strategies instead of the current pipeline approach.

Transcript of Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer...

Page 1: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Strategies for Advanced Question Answering

Sanda Harabagiu & Finley LacatusuLanguage Computer Corporation

HLT-NAACL2004 Workshop

Page 2: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Abstract

• Combining multiple strategies that optimally resolve different question classes of various degrees of complexity

• Enhancing the precision of question interpretation and answer extraction

• Question decomposition and answer fusion

Page 3: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Introduction

Our fundamental premise is that progress in Q/A cannot be achieved only by enhancing the processing components, but it also requires generating the best strategies for processing each individual question. Thus we believe that Q/A systems capable of successfully processing complex questions should employ multiple strategies instead of the current pipeline approach.

Page 4: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Q/A Systems Capable

• Pipeline architecture– Question processing– Passage retrieval– Answer selection

• Combining strategies for advanced QA– Knowledge-based Q/A implementation– Statistical noisy-channel algorithm for Q/A– Pattern-based approach that learn from Web

Page 5: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Optimal Strategies of Advanced QA

• Question Decomposition• Answer Fusion• Feedback from Interactive Q&A• User Background Recognition

Page 6: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Instance (1/2)

• How have thefts impacted on the safety of Russia’s nuclear navy, and has the theft problem been increased or decreased over time?– What specific instances of theft do we know

about?– What sort of items have been stolen?

Page 7: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Instance (2/2)

• The decompositions– Who are the perpetrators of these thefts?– Do thefts have an economical impact on the

naval bases?• The concepts need to be understood

– What is meant by nuclear navy?– What does ‘impact’ mean?

Page 8: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Decomposition Criteria (1/4)

1. Decompositions along the constituents they coordinate– question stem level

• When and where did the thefts occur?– at predicate level

• How does one define an increase or a decrease in the theft problem?

– at argument level• To what degree do different thefts put nuclear or radioactive

materials at risk?– at question level

• What specific instances of theft do we know about, and what are the sources of this information?

Page 9: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Decomposition Criteria (2/4)

• Question decomposition by identifying coordinations– disambiguation of conjunctives for identifying

when they indicate separate questions as opposed to when they just coordinate constituents

– reference and ellipsis resolution of anaphoric expressions in the original question

– recognition of the relations between the resulting, decomposed questions

Page 10: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Decomposition Criteria (3/4)

2. The question asks about– a complex relation– comparison with similar situations– elaboration of a state of affairs

• Determines the decomposition into– definition question– specializations of the predicate-concept– examples

Page 11: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Decomposition Criteria (4/4)

• 3. Elaborations of its arguments– nested predicate-argument structures– quantifications– instantiations

Page 12: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Large Database

• The pairs need to be diverse in terms of difficulty,where difficulty can be defined in terms of answer type complexity,answer granularity,ease of matching.

• The pairs should be reliable, i.e. each question must be associated with a correct answer.

Page 13: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Our Solution

• Combination of collection and generation from semi-structured resources, followed by expansion and validation.

• Generate the collection of QA pairs from Frequently Asked Questions (FAQ) files on various topics.

• Develop a dedicated harvesting algorithm to identify FAQ's on the Web and extract the QA pairs.

Page 14: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Answer Resolution Strategies

Page 15: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Overview

• Answer Funsion, Ranking and Reliability• Bootstrapping Question Answering• User Background• Processing Negation in Question Answerin

g• Conclusions

Page 16: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Answer Fusion, Ranking and Reliability

• An open-domain, template-based answer formalization

• A probabilistic model• A set of template merging operators

Page 17: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Open-domain Template Representation

Page 18: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Detection of Template Relations

• A novel matching approach based on template attributes that support relation detection for merging.

• The approach combines phrasal parsing, lemma normalization and semantic approximation

Page 19: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Fusion Operators

Page 20: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Answer Fusion Block Architecture

Page 21: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Bootstrapping Question Answering

• What weapons of mass destruction (WMD) does Iraq have?– answer type “WMD”– accepts concepts such as “anthrax”

• Exact answer– LCC’s system– answer type (AT)

Page 22: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Answer Instance Bootstrapping Algorithm

Page 23: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Example

• What viral agent was used in Iraq?– If the answer type concept does not exist in W

ordNet• the bootstrapping algorithm will create a distinct ca

tegory for this concept.– If the answer type concept exists in WordNet

• the algorithm attaches the bootstrapped entities and patterns to the concept hypernym that provides the largest coverage without overlapping any other known categories

Page 24: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Enhancing Retrieval, Navigation, And Fusion

• (biological AND agents AND Qaeda)– What biological agents does al Qaeda

possess?• The extensions to the AT ontology, enable

an intelligent query expansion– AT instances– extraction patterns

Page 25: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Expanded Query• ((biological AND agents) OR (bacterial AND agen

t) OR (viral AND agent) OR (fungal AND agent) OR (toxic AND agent) OR botulism OR botulinum OR smallpox OR encephalitis OR (deploy)) AND (Qaeda)

• the conversion of extraction patterns into keywords– “deploy” for "deploy ANY-WMD“

• the controlled expansion through selective keyword selection– for “biological agents”

Page 26: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Continuous Updating of Scenario Knowledge

Page 27: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

User Background

• all users are different: not only do they have different backgrounds and expertise, but they also vary in their goals and reasons for using a Q/A system.

Page 28: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Different user selections from the generated question decomposition tree

Page 29: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Assessing User Background

• We evaluate users via a discrete evaluation scale, which ranks users as novice, casual, or expert users based on how much background knowledge they have on the given topic

• For example, if the user is known to be an “expert”, only the paths generated through “expert” decomposition - i.e. generated using significant world and topic knowledge – will be followed.

Page 30: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Representing User Background

• Traditionally, the user profile has been represented as a term vector

• P = ((t1, w1), (t2, w2), …, (tn, wn))– each profile P– ti are terms from relevant documents– wi are term weights, typically computed with th

e tf * idf metric.

Page 31: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Two Regards

• Common for one user to explore multiple topics even during the same session– Pi = ((ti1, wi1), (ti2, wi2), …(tim, wim)), i = 1, 2, …,

n, and m is the size of vector pi.• When a new document is marked as relev

ant– merged with an existing profile, if their similarit

ies are higher than a given threshold– Used to generate a new profile

Page 32: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

The Expert User’s Background

• Multiple vectors• Each specializes on a clear• Domain-specific direction

Page 33: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Second Innovation

• “al” is among the most frequent terms, but, by itself, “al” is considered a stop word by most information retrieval systems

• However, the significance of the term becomes evident when the complete concept, “al Qaeda” is considered.

Page 34: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Processing Negation In Question Answering

• Previous Q/A systems– filtering the retrieved answer– eliminating answers

• Examples– Which countries did not vote for the Iraq war

resolution in the Security Council?– Which countries did not provide help to the

coalition during the Gulf War in 1991?– What planets have no moon?

Page 35: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Recognizing The Most Frequent Cases of Negation

• no• with no terrorists, the world would be safer• nothing• the inspectors found nothing• thefts did not occur at the beginning• the president never leaves the White Hous

e without the Secret Service approval

Page 36: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Hypotheses

• we assume that when a speaker is formulating a question to find out whether a proposition is true or false, s/he formulates the question with the form of the proposition which would be the most informative if it turned out to be true

• We expect that if a question has the form of negation, the speaker believes that the negative answer is the most informative

Page 37: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Negation Be Addressed In Q/A

• By using the user background• By interacting with the user• By finding cues from the answers to the po

sitive question

Page 38: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.

Conclusions• Question decompositions following several

criteria• Answer fusion which composes a unique,

coherent answer from the partial answers extracted for each decomposed question

• Modeling of user background• Processing of negation in questions and/or

answers• Bootstrapping algorithm that enhances the

precision of factual Q/A

Page 39: Strategies for Advanced Question Answering Sanda Harabagiu & Finley Lacatusu Language Computer Corporation HLT-NAACL2004 Workshop.