Detecting Anaphoricity and Antecedenthood for Coreference Resolution

Detecting Anaphoricity and Antecedenthood for Coreference

Resolution

Olga Uryupina ([email protected])

Institute of Linguistics, RAS 13.11.08

mailto:[email protected]





Overview

• Anaphoricity and Antecedenthood• Experiments• Incorporating A&A detectors into a

CR system• Conclusion

A&A: example

Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.

Anaphoricity

Likely anaphors:- pronouns, definite descriptions

Unlikely anaphors:- indefinites

Unknown:- proper names

Poesio&Vieira: more than 50% of definite descriptions in a newswire text are not anaphoric!

A&A: example

Shares in Loral Space will be distributed to Loral shareholders. The new company will start life with no debt and $700 million in cash. Globalstar still needs to raise $600 million, and Schwartz said that the company would try to raise the money in the debt market.

Antecedenthood

Related to referentiality (Karttunen, 1976):

„no debt“ etc

Antecedenthood vs. Referentiality: corpus-based decision

Experiments

• Can we learn anaphoricity/antecedenthood classifiers?

• Do they help for coreference resolution?

Methodology

• MUC-7 dataset • Anaphoricity/antecedenthood

induced from the MUC annotations• Ripper, SVM

Features

• Surface form (12)• Syntax (20)• Semantics (3)• Salience (10)• „same-head“ (2)• From Karttunen, 1976 (7)

49 features – 123 boolean/continuous

Results: anaphoricity

Feature groups R P F

Baseline 100 66.5 79.9

All 93.5 82.3 87.6

Surface 100 66.5 79.9

Syntax 97.4 72.0 82.8

Semantics 98.5 68.9 81.1

Salience 91.2 69.3 78.7

Same-head 84.5 81.1 82.8

Karttunen‘s 91.6 71.1 80.1

Synt+SH 90.0 83.5 86.6

Results: antecedenthood

Feature groups R P F

Baseline 100 66.5 79.9

All 95.7 69.2 80.4

Surface 94.6 68.5 79.5

Syntax 95.7 69.2 80.3

Semantics 94.9 69.4 80.2

Salience 98.9 67.0 79.9

Same-head 100 66.5 79.9

Karttunen‘s 99.3 67.3 80.2

Integrating A&A into a CR system

Apply an A&A prefiltering before CR starts:

- Saves time- Improves precision

Problem: we can filter out good candidates..:

- Will loose some recall

Oracle-based A&A prefiltering

Take MUC-based A&A classifier („gold standard“

CR system: Soon et al. (2001) with SVMs

MUC-7 validation set (3 „training“ documents)

Oracle-based A&A prefiltering

R P F

No prefilteing 54.5 56.9 55.7

±ana 49.6 73.6 59.3

±ante 54.2 69.4 60.9

±ana & ±ante 52.9 81.9 64.3

Automatically induced classifiers

Precision more crucial than Recall

Learn Ripper classifiers with different Ls (Loss Ratio)

Anaphoricity prefiltering

Antecedenthood prefiltering

Conclusion

Automatically induced detectors:• Reliable for anaphoricity• Much less reliable for antecedenthood(a corpus, explicitly annotated for

referentiality could help)A&A prefiltering:• Ideally, should help• In practice – substantial optimization

required

Thank You!

Detecting Anaphoricity and Antecedenthood for Coreference Resolution

Documents

Transcript of Detecting Anaphoricity and Antecedenthood for Coreference Resolution