Intrinsic(and Extrinsic(Approaches( to(Recognizing(Textual ... · Scope(• Logic$Entailment –...

Intrinsic and Extrinsic Approaches to Recognizing Textual Entailment

Rui Wang DFKI GmbH & Saarland University

Intrinsic vs. Extrinsic

5/23/11 Oslo, Norway 2

Intrinsic vs. Extrinsic


Recognizing Textual Entailment (RTE)

•  Textual Entailment (Chierchia and McConnell-‐Ginet, 2000) – One text can be inferred by another text.

•  An example – Text (T): Google files for its long awaited IPO. – Hypothesis (H): Google goes public.


Scope •  Logic Entailment

– H is true OR T is not true

•  LinguisXc ImplicaXon – ConvenXonal Implicature – ConversaXonal Implicature

•  Modality


Machine Transla@on (MT) Triangle


RTE Rectangle


Outline


Outline (cont.) •  Intrinsic Approaches

–  Architecture –  RTE with Event Triples –  RTE with Inference Rules

•  Extrinsic Approaches – MoXvaXon – MulX-‐Dimensional ClassificaXon Model

•  Summary and PerspecXves 5/23/11 Oslo, Norway 9

Intrinsic Approaches

An Example •  T: Bush used his weekly radio address to try to build support for his plan to allow workers to divert part of their Social Security payroll taxes into private investment accounts.

•  H: Mr. Bush is proposing that workers be allowed to divert their payroll taxes into private accounts.


Performance of the Exis@ng Systems

5/23/11

0.8

0.746 0.735

0.685 0.706

0.669

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

RTE-‐3 RTE-‐4 RTE-‐5

1st

2nd

3rd

4th

5th

Wang

Oslo, Norway 12

Architecture


Specialized RTE Module •  Two requirements

– A good target – A good tackle

•  Two examples – Event tuples containing named-‐enXXes – Tree skeleton with inference rules


Event Tuple •  < Event type, Time,

LocaXon,

List > • Persons • OrganizaXons


An Example •  T: Released in 1995, Tyson returned to boxing, winning the World Boxing Council Ktle in 1996. The same year, however, he lost to Evander Holyfield, and in a 1997 rematch bit Holyfield’s ear, for which he was temporarily banned from boxing.

•  H: In 1996 Mike Tyson bit Holyfield’s ear.


Event Time Pair •  T:

–  –  –  – 

•  H: – 


Event Tuple •  T:

–  –  –  – 

•  H: – 


Entailment Recogni@on •  Event type comparison

– Lexical resources, e.g. WordNet, VerbOcean

•  Named-‐enXty comparison – Time expressions normalizaXon and anchoring – Ontology of geographical terms – Person name / organizaXon name parXal matching


Inference Rules

5/23/11

•  DIRT collecXon (Lin and Pantel, 2001)

Oslo, Norway 21

Tree Skeletons

T: Doctor Robin Warren and Barry Marshall received Nobel Prize …

H: Robin Warren was awarded a Nobel Prize.


Results

Systems RTE-‐2

BoW TACTE with ExDIRT

Coverage 100% 10.2% 8.1%

Accuracy* 0.579 0.684 0.654

5/23/11

*On covered data

RTE-‐3

BoW TACTE with ExDIRT

100% 8.1% 5.8%

0.611 0.655 0.721

Systems

Coverage

Accuracy*

Oslo, Norway 23

Summary and Future Extensions

Target Tackle Preferences

Event tuples NE recogniXon NE relaXon/resoluXon “No”

Tree skeleton with inference rules

Matched by DIRT rules

Match tree skeleton with DIRT rules

“Yes”

Future Extensions

NegaXon and modality NegaXon words or modal verbs

Scope and entailment rules

N/A

Logic inferencer High confidence Theorem prover N/A

External knowledge bases

Covered cases Knowledge applicaXon

N/A


Extrinsic Approaches

Relevance and Necessity

5/23/11

No “necessity”

Oslo, Norway 26

An Example •  T: At least five people have been killed in a head-‐on train collision in north-‐eastern France, while others are sKll trapped in the wreckage. All the vicKms are adults.

•  H: A French train crash killed children.

•  Contradictory but relevant!


Textual Seman@c Rela@on (TSR)

5/23/11

Entailment

Unknown

Oslo, Norway 28

Related Tasks •  ContradicXon recogniXon

–  ContradicKon vs. Others (de Marneffe et al., 2008)

•  Paraphrase acquisiXon –  Paraphrase vs. Others (a survey by Androutsopoulos and MalakasioKs (2010))

•  DirecXonality recogniXon –  Entailment vs. Paraphrase (e.g., Kotlerman et al., 2009)


RTE TSR Rectangle


Meaning Representa@on •  The TACTE system: Event Xme pairs •  The ExTACTE system: Event tuples •  The ExDIRT system: Tree skeletons

•  Dependency triple sets: {DEPT} and {DEPH} – SyntacXc dependency tree – SemanXc dependency graph –  Joint Dependency graph


Meaning Representa@on (cont.) •  H: Value is quesKoned.

•  SyntacXc dependency –  – 

•  SemanXc dependency – 


Meaning Representa@on (cont.) •  T: Devotees of the market quesKon the value of the work.


Features •  H_NULL? (Boolean): whether H has dependencies •  T_NULL? (Boolean) : whether T has dependencies •  DIR? (Boolean) : whether T, H have same direcXon •  MULTI? (Boolean) : Add "m_" to REL_PAIR if yes •  DEP_SAME? (Boolean) : whether same dependency •  REL_SIM? (Boolean) : whether similar relaXon •  REL_SAME? (Boolean) : whether same relaXon •  REL_PAIR (String) : dependency relaXon names


Direct Classifica@on


Two-‐Stage Classifica@on

5/23/11

3.4%

Oslo, Norway 36

Performance of the Exis@ng Systems

5/23/11

•  Three-way RTE: Entailment, Contradiction, Unknown 0.685 0.683

0.637

0.614

0.5

0.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

RTE-‐3 RTE-‐4

1st

2nd

3rd

4th

5th

Wang

Oslo, Norway 37

The 3-‐Dimensional Model


The 3-‐Dimensional Model (cont.) •  Two-‐stage classificaXon

•  Three measurements –  Relatedness –  Consistency –  Equivalence

Relatedness Consistency Equivalence

Paraphrase (P) + + +

Entailment (E) + + -‐

Contradic@on (C) + -‐ -‐

Unknown (U) -‐ + -‐


Corpora •  Overview

– The RTE corpora – The PETE corpus – The MSR corpus

•  Two issues – The annotaXon scheme – Data sources


The TSR Corpus •  Adjacent sentence pairs from the RST Discourse Treebank

•  Six categories: backward entailment, forward entailment, equality, contradicKon, overlapping, and independent


The AMT Corpus •  Crowd-‐sourcing with Amazon Mechanical Turk •  Facts vs. Counter-‐Facts


Datasets Corpora Paraphrase (P) Entailment (E) Contradic@on (C) Unknown (U)

AMT (584) / Facts (406) Counter-‐facts

(178) /

MSR (5841) Paraphrase (3940)

Non-‐Paraphrase (1901)

PETE (367) / YES (194) NO (173)

RTE (2200) ENTAILMENT (1100) CONTRADICTION

(330) UNKNOWN

(770)

TSR (260) Equality (3)

Forward/Backward Entailment (10/27)

ContradicXon (17)

Overlapping & Independent

(203)

Total (9252) 3943 637 525 973


Results (RTE)

Systems 4-‐Way 3-‐Way 2-‐Way

(C, E, P, U) (C, E&P, U) (E&P, C&U)

Direct BoW 39.3% 54.5% 63.2%

Direct Joint 42.3% 50.9% 66.8%

Only Relatedness (Our Prev.) / 59.1% 69.2%

3-‐D Model 45.9% 58.2% 69.9%

MacCartney and Manning (2007)* / / 59.4%

Heilman and Smith (2010)* / / 62.8%

5/23/11

*Different test data

Oslo, Norway 44

Results (Paraphrase)

P vs. C&E&U Accuracy Precision Recall

3-‐D Model 79.6% 57.2% 72.8%

Das and Smith (2009) (QG)* 73.9% 74.9% 91.3%

Das and Smith (2009) (PoE)* 76.1% 79.6% 86.0%

Heilman and Smith (2010)* 73.2% 75.7% 87.8%

5/23/11

*Different test data

Oslo, Norway 45

Results (cont.)


Summary and Perspec@ves

Intrinsic Approaches •  Event tuple

– Event Xme pair – Extended to other slots (NEs)

•  (Textual) inference rules – DIRT (with tree skeleton) – More general paraphrase resources (ongoing)


Extrinsic Approaches •  Textual semanXc relaXons

– Relatedness recogniXon – Extended with two other dimensions

•  Specialized TSR modules – Split the data (as for RTE) – SystemaXc feature engineering


Applica@ons of RTE •  Answer validaXon (Penãs et al., 2007; Rodrigo et al., 2008) –  The best results for both English and German (Wang and Neumann, 2008a)

•  RelaXon validaXon (Wang and Neumann, 2008b)

•  Parser evaluaXon (Yuret et al., 2010) –  The 3rd place (Wang and Zhang, 2010)


Acknowledgements •  Chris Callison-‐Burch •  Georgiana Dinu •  Guenter Neumann •  Caroline Sporleder •  Hans Uszkoreit •  Yajing Zhang •  Yi Zhang


Thank YOU!

QuesXons?

Intrinsic(and Extrinsic(Approaches( to(Recognizing(Textual ... · Scope(• Logic$Entailment –...

Documents

Transcript of Intrinsic(and Extrinsic(Approaches( to(Recognizing(Textual ... · Scope(• Logic$Entailment –...