© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges...

99
© Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA • System Architecture • Methods System Evaluation • State-of-the-art Lecture 2 Question Analysis • Background Knowledge Answer Typing Lecture 3 Query Generation Document Analysis Semantic Indexing Answer Extraction Selection and Ranking

Transcript of © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges...

Page 1: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art

Lecture 2• Question Analysis• Background Knowledge• Answer Typing

Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking

Page 2: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Pronto architecture

Page 3: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Lecture 3

Page 4: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3Query Generation

• Document Analysis

• Semantic Indexing

• Answer Extraction

• Selection and Ranking

Page 5: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 6: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Query Generation

• Once we analysed the question, we need to retrieve appropriate documents

• Most QA systems use an off-the-shelf information retrieval system for this task

• Examples:– Lemur– Lucene– Indri (used by Pronto)

• The input of the IR system is a query;the output is a ranked set of documents

Page 7: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Queries

• Query generation depends on the way documents are indexed

• Based on– Semantic analysis of the question– Expected answer type– Background knowledge

• Computing a good query is hard – we don’t want too little documents, and we don’t want too many!

Page 8: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Generating Query Terms

• Example 1:– Question: Who discovered prions?

– Text A: Dr. Stanley Prusiner received the Nobel prize for the discovery of prions.

– Text B: Prions are a kind of proteins that…

• Query terms?

Page 9: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Generating Query Terms

• Example 2:– Question: When did Franz Kafka die?

– Text A: Kafka died in 1924.– Text B: Dr. Franz died in 1971.

• Query terms?

Page 10: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Generating Query Terms

• Example 3:– Question: How did actor James Dean die?

– Text:

James Dean was killed in a car accident.

• Query terms?

Page 11: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Useful query terms

• Ranked on importance:– Named entities– Dates or time expressions– Expressions in quotes– Nouns– Verbs

• Queries can be expanded using the created local knowledge base

Page 12: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Query expansion example

• Query: sacajawea Returns only five documents

• Use synonyms in query expansions

• New query: sacajawea OR sagajaweaReturns two hundred documents

TREC 44.6 (Sacajawea)

How much is the Sacajawea coin worth?

Page 13: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 14: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3• Query GenerationDocument Analysis

• Semantic Indexing

• Answer Extraction

• Selection and Ranking

Page 15: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Document Analysis – Why?

• The aim of QA is to output answers, not documents

• We need document analysis to– Find the correct type of answer in the

documents– Calculate the probability that an answer

is correct

• Semantic analysis is important to get valid answers

Page 16: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Document Analysis – When?

• After retrieval– token or word based index– keyword queries– low precision

• Before retrieval– semantic indexing– concept queries– high precision– More NLP required

Page 17: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Document Analysis – How?

• Ideally use the same NLP tools as for question analysis– This will make the semantic matching of

Question and Answer easier– Not always possible: wide coverage tools

are usally good at analysing text, but not at analysing questions

– Questions are often not part of large annotated corpora, on which NLP tools are trained

Page 18: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Documents vs Passages

• Split documents into smaller passages– This will make the semantic matching

faster and more accurate– In Pronto the passage size is two

sentences, implemented by a sliding window

• Too small passages risk losing important contextual information– Pronouns and referring expressions

Page 19: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Document Analysis

• Tokenisation

• Part of speech tagging

• Lemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)• Named entity recognition• Anaphora resolution

Page 20: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.

Page 21: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.– Text B:

Kafka lived in Austria. He died in 1924.

Page 22: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.– Text B:

Kafka lived in Austria. He died in 1924.– Text C:

Both Kafka and Lenin died in 1924.

Page 23: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.– Text B:

Kafka lived in Austria. He died in 1924.– Text C:

Both Kafka and Lenin died in 1924.– Text D:

Max Brod, who knew Kafka, died in 1930.

Page 24: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Why semantics is important

• Example:– Question: When did Franz Kafka die? – Text A:

The mother of Franz Kafka died in 1918.– Text B:

Kafka lived in Austria. He died in 1924.– Text C:

Both Kafka and Lenin died in 1924.– Text D:

Max Brod, who knew Kafka, died in 1930.– Text E:

Someone who knew Kafka died in 1930.

Page 25: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

DRS for “The mother of Franz Kafka died in 1918.”

_____________________ | x3 x4 x2 x1 | |---------------------| | mother(x3) | | named(x4,kafka,per) | | named(x4,franz,per) | | die(x2) | | thing(x1) | | event(x2) | | of(x3,x4) | | agent(x2,x3) | | in(x2,x1) | | timex(x1)=+1918XXXX | |_____________________|

Page 26: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

DRS for:“Kafka lived in Austria. He died in 1924.”

_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

Page 27: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

DRS for: “Both Kafka and Lenin died in 1924.”

_____________________| x6 x5 x4 x3 x2 x1 ||---------------------|| named(x6,kafka,per) || die(x5) || event(x5) || agent(x5,x6) || in(x5,x4) || timex(x4)=+1924XXXX || named(x3,lenin,per) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1924XXXX ||_____________________|

Page 28: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

DRS for:“Max Brod, who knew Kafka, died in 1930.”

_____________________| x3 x5 x4 x2 x1 ||---------------------|| named(x3,brod,per) || named(x3,max,per) || named(x5,kafka,per) || know(x4) || event(x4) || agent(x4,x3) || patient(x4,x5) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1930XXXX ||_____________________|

Page 29: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

DRS for:“Someone who knew Kafka died in 1930.”

_____________________| x3 x5 x4 x2 x1 ||---------------------|| person(x3) || named(x5,kafka,per) || know(x4) || event(x4) || agent(x4,x3) || patient(x4,x5) || die(x2) || event(x2) || agent(x2,x3) || in(x2,x1) || timex(x1)=+1930XXXX ||_____________________|

Page 30: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Document Analysis

• Tokenisation

• Part of speech tagging

• Lemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)Named entity recognition• Anaphora resolution

Page 31: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Recall the Answer-Type Taxonomy

• We divided questions according to their expected answer type

• Simple Answer-Type Taxonomy

PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY

Page 32: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Named Entity Recognition

• In order to make use of the answer types, we need to be able to recognise named entities of the same types in the documents

PERSONNUMERALDATEMEASURELOCATIONORGANISATIONENTITY

Page 33: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Example Text

Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of  Arthur Andersen. 

Page 34: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Named entities

Italy’s business world was rocked by the announcement last Thursday that Mr. Verdi would leave his job as vice-president of Music Masters of Milan, Inc to become operations director of  Arthur Andersen. 

Page 35: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Named Entity Recognition

<ENAMEX TYPE=„LOCATION“>Italy</ENAME>‘s business world was rocked by the announcement <TIMEX TYPE=„DATE“>last Thursday</TIMEX> that Mr. <ENAMEX TYPE=„PERSON“>Verdi</ENAMEX> would leave his job as vice-president of <ENAMEX TYPE=„ORGANIZATION“>Music Masters of Milan, Inc</ENAMEX> to become operations director of  <ENAMEX TYPE=„ORGANIZATION“>Arthur Andersen</ENAMEX>. 

Page 36: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

NER difficulties

• Several types of entities are too numerous to include in dictionaries

• New names turn up every day

• Ambiguities – Paris, Lazio

• Different forms of same entities in same text– Brian Jones … Mr. Jones

• Capitalisation

Page 37: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

NER approaches

• Rule-based approaches– Hand-crafted rules– Help from databases of known

named entities [e.g. locations]

• Statistical approaches– Features – Machine learning

Page 38: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Document Analysis

• Tokenisation

• Part of speech tagging

• Lemmatisation

• Syntactic analysis (Parsing)

• Semantic analysis (Boxing)• Named entity recognitionAnaphora resolution

Page 39: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

What is anaphora?

• Relation between a pronoun and another element in the same or earlier sentence

• Anaphoric pronouns: – he, him, she, her, it, they, them

• Anaphoric noun phrases:– the country, – these documents, – his hat, her dress

Page 40: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Anaphora (pronouns)

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of its tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: ?

Page 41: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Anaphora (definite descriptions)

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of the country’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: ?

Page 42: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Anaphora Resolution

• Anaphora Resolution is the task of finding the antecedents of anaphoric expressions

• Example system:– Mitkov, Evans & Orasan (2002)– http://clg.wlv.ac.uk/MARS/

Page 43: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

“Kafka lived in Austria. He died in 1924.”

_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

Page 44: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

“Kafka lived in Austria. He died in 1924.”

_______________________ _____________________ | x3 x1 x2 | | x5 x4 | |-----------------------| |---------------------|(| male(x3) |+| die(x5) |) | named(x3,kafka,per) | | thing(x4) | | live(x1) | | event(x5) | | agent(x1,x3) | | agent(x5,x3) | | named(x2,austria,loc) | | in(x5,x4) | | event(x1) | | timex(x4)=+1924XXXX | | in(x1,x2) | |_____________________| |_______________________|

Page 45: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Co-reference resolution

• Question:What is the biggest sector in Andorra’s economy?

• Corpus:Andorra is a tiny land-locked country in southwestern Europe, between France and Spain. Tourism, the largest sector of Andorra’s tiny, well-to-do economy, accounts for roughly 80% of the GDP.

• Answer: Tourism

Page 46: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3• Query Generation

• Document AnalysisSemantic Indexing

• Answer Extraction

• Selection and Ranking

Page 47: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 48: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic indexing

• If we index documents on the token level, we cannot search for specific semantic concepts

• If we index documents on semantic concepts, we can formulate more specific queries

• Semantic indexing requires a complete preprocessing of the entire document collection [can be costly]

Page 49: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic indexing example

• Example NL question:

When did Franz Kafka die?

• Term-based – query: kafka– Returns all passages containing the term “kafka"

Page 50: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic indexing example

• Example NL question: When did Franz Kafka die?

• Term-based – query: kafka– Returns all passages containing the term “kafka"

• Concept-based – query: DATE & kafka – Returns all passages containing the term "kafka"

and a date expression

Page 51: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3• Query Generation

• Document Analysis

• Semantic IndexingAnswer Extraction

• Selection and Ranking

Page 52: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 53: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer extraction

• Passage retrieval gives us a set of ranked documents

• Match answer with question– DRS for question– DRS for each possible document– Score for amount of overlap

• Deep inference or shallow matching

• Use knowledge

Page 54: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer extraction: matching

• Given a question and an expression with a potential answer, calculate a matching score S = match(Q,A) that indicates how well Q matches A

• Example– Q: When was Franz Kafka born?

– A1: Franz Kafka died in 1924.

– A2: Kafka was born in 1883.

Page 55: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Using logical inference

• Recall that Boxer produces first order representations [DRSs]

• In theory we could use a theorem prover to check whether a retrieved passage entails or is inconsistent with a question

• In practice this is too costly, given the high number of possible answer + question pairs that need to be considered

• Also: theorem provers are precise – they don’t give us information if they almost find a proof, although this would be useful for QA

Page 56: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic matching

• Matching is an efficient approximation to the inference task

• Consider flat semantic representation of the passage and the question

• Matching gives a score of the amount of overlap between the semantic content of the question and a potential answer

Page 57: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Matching Example

• Question: When was Franz Kafka born?

• Passage 1:Franz Kafka died in 1924.

• Passage 2:Kafka was born in 1883.

Page 58: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

Page 59: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

X=x2

Page 60: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

Page 61: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

Y=x1

Page 62: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

Page 63: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Q: A1:

Page 64: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [1]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

Match score = 3/6 = 0.50

Q: A1: franz(x1)

kafka(x1)

die(x3)

agent(x3,x1)

in(x3,x2)

1924(x2)

Page 65: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Page 66: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(X)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,X)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

X=x2

Page 67: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Page 68: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(Y)

kafka(Y)

born(E)

patient(E,Y)

temp(E,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Y=x1

Page 69: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Page 70: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(E)

patient(E,x1)

temp(E,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

E=x3

Page 71: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(x3)

patient(x3,x1)

temp(x3,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Page 72: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(x3)

patient(x3,x1)

temp(x3,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Page 73: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Semantic Matching [2]

answer(x2)

franz(x1)

kafka(x1)

born(x3)

patient(x3,x1)

temp(x3,x2)

kafka(x1)

born(x3)

patient(x3,x1)

in(x3,x2)

1883(x2)

Q: A2:

Match score = 4/6 = 0.67

Page 74: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Matching Example

• Question: When was Franz Kafka born?

• Passage 1:Franz Kafka died in 1924.

• Passage 2:Kafka was born in 1883.

Match score = 0.67

Match score = 0.50

Page 75: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Matching Techniques

• Weighted matching– Higher weight for named entities– Estimate weights using machine learning

• Incorporate background knowledge– WordNet [hyponyms]– NomLex– Paraphrases:

BORN(E) & IN(E,Y) & DATE(Y) TEMP(E,Y)

Page 76: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering

Lecture 3• Query Generation

• Document Analysis

• Semantic Indexing

• Answer ExtractionSelection and Ranking

Page 77: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 78: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer selection

• Rank answer– Group duplicates– Syntactically or semantically equivalent– Sort on frequency

• How specific should an answer be?– Semantic relations between answers– Hyponyms, synonyms– Answer modelling

[PhD thesis Dalmas 2007]

• Answer cardinality

Page 79: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer selection example 1

• Where did Franz Kafka die?– In his bed– In a sanatorium– In Kierling– Near Vienna– In Austria

– In Berlin– In Germany

Page 80: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer selection example 2

• Where is 3M based?– In Maplewood– In Maplewood, Minn.– In Minnesota– In the U.S.– In Maplewood, Minn., USA

– In San Francisco– In the Netherlands

Page 81: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

knowledge

parsing

boxing

query

answertyping

Indri

answerextraction

answerselection

answerreranking

question answerccg

drs WordNetNomLex

Indexed Documents

Architecture of PRONTO

Page 82: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Reranking

• Most QA systems first produce a list of possible answers…

• This is usually followed by a process called reranking

• Reranking promotes correct answers to a higher rank

Page 83: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Factors in reranking

• Matching score– The better the match with the question, the

more likely the answers

• Frequency– If the same answer occurs many times,

it is likely to be correct

Page 84: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer Validation

• Answer Validation– check whether an answer is likely to be

correct using an expensive method

• Tie breaking– Deciding between two answers with similar

probability

• Methods:– Inference check– Sanity checking– Googling

Page 85: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Inference check

• Use first-order logic [FOL] to check whether a potential answer entails the question

• This can be done with the use of a theorem prover– Translate Q into FOL– Translate A into FOL– Translate background knowledge into FOL – If ((BKfol & Afol) Qfol) is a theorem,

we have a likely answer

Page 86: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Sanity Checking

Answer should be informative, that is, not part of the question

Q: Who is Tom Cruise married to?

A: Tom Cruise

Q: Where was Florence Nightingale born?

A: Florence

Page 87: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Googling

• Given a ranked list of answers, some of these might not make sense at all

• Promote answers that make sense

• How?

• Use even a larger corpus!– “Sloppy” approach– “Strict” approach

Page 88: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

The World Wide Web

Page 89: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer validation (sloppy)

• Given a question Q and a set of answers A1…An

• For each i, generate query Q Ai

• Count the number of hits for each i

• Choose Ai with most number of hits

• Use existing search engines– Google, AltaVista– Magnini et al. 2002 (CCP)

Page 90: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Corrected Conditional Probability

• Treat Q and A as a bag of words– Q = content words question– A = answer

hits(A NEAR Q)

• CCP(Qsp,Asp) = ------------------------------ hits(A) x hits(Q)

• Accept answers above a certain CCP threshold

Page 91: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Answer validation (strict)

• Given a question Q and a set of answers A1…An

• Create a declarative sentence with the focus of the question replaced by Ai

• Use the strict search option in Google– High precision– Low recall

• Any terms of the target not in the sentence as added to the query

Page 92: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Example

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Top-5 Answers: 1) Britain

* 2) Okemah, Okla.3) Newport

* 4) Oklahoma5) New York

Page 93: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Example: generate queries

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Generated queries: 1) “Guthrie was born in Britain”

2) “Guthrie was born in Okemah, Okla.”3) “Guthrie was born in Newport”4) “Guthrie was born in Oklahoma”5) “Guthrie was born in New York”

Page 94: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Example: add target words

• TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

• Generated queries: 1) “Guthrie was born in Britain” Woody

2) “Guthrie was born in Okemah, Okla.” Woody3) “Guthrie was born in Newport” Woody4) “Guthrie was born in Oklahoma” Woody5) “Guthrie was born in New York” Woody

Page 95: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Example: morphological variants

TREC 99.3

Target: Woody Guthrie.

Question: Where was Guthrie born?

Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody

“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody

“Guthrie is OR was OR are OR were born in Newport” Woody

“Guthrie is OR was OR are OR were born in Oklahoma” Woody

“Guthrie is OR was OR are OR were born in New York” Woody

Page 96: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Example: google hits

TREC 99.3

Target: Woody Guthrie.

Question: Where was Guthrie born?

Generated queries:“Guthrie is OR was OR are OR were born in Britain” Woody 0

“Guthrie is OR was OR are OR were born in Okemah, Okla.” Woody 10

“Guthrie is OR was OR are OR were born in Newport” Woody 0

“Guthrie is OR was OR are OR were born in Oklahoma” Woody 42

“Guthrie is OR was OR are OR were born in New York” Woody 2

Page 97: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Example: reranked answers

TREC 99.3Target: Woody Guthrie.Question: Where was Guthrie born?

Original answers 1) Britain

* 2) Okemah, Okla.3) Newport

* 4) Oklahoma5) New York

Reranked answers * 4) Oklahoma

* 2) Okemah, Okla.5) New York 1) Britain3) Newport

Page 98: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Question Answering (QA)

Lecture 1• What is QA?• Query Log Analysis• Challenges in QA• History of QA• System Architecture• Methods• System Evaluation• State-of-the-art

Lecture 2• Question Analysis• Background Knowledge• Answer Typing

Lecture 3• Query Generation• Document Analysis• Semantic Indexing• Answer Extraction• Selection and Ranking

Page 99: © Johan Bos April 2008 Question Answering (QA) Lecture 1 What is QA? Query Log Analysis Challenges in QA History of QA System Architecture Methods System.

© J

oh

an B

os

Ap

ril 2

008

Where to go from here

• Producing answers in real-time

• Improve accuracy

• Answer explanation

• User modelling

• Speech interfaces

• Dialogue (interactive QA)

• Multi-lingual QA

• Non sequential architectures