Relational Entity Linking with Cross Document Coreference

57
Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana- Champaign (UI_CCG) 1

description

Relational Entity Linking with Cross Document Coreference. Xiao Cheng, Bingling Chen, Rajhans Samdani , Kai-Wei Chang, Zhiye Fei and Dan Roth University of Illinois at Urbana-Champaign (UI_CCG). Talk Outline. Introduction Architecture Entity Linking Approach Preprocessing - PowerPoint PPT Presentation

Transcript of Relational Entity Linking with Cross Document Coreference

Page 1: Relational Entity Linking with Cross Document Coreference

1

Relational Entity Linking with Cross Document Coreference

Xiao Cheng, Bingling Chen, Rajhans Samdani,Kai-Wei Chang, Zhiye Fei and Dan Roth

University of Illinois at Urbana-Champaign (UI_CCG)

Page 2: Relational Entity Linking with Cross Document Coreference

2

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

Page 3: Relational Entity Linking with Cross Document Coreference

3

Entity Linking Specification

Query

Output

Wikipedia

Arti-cles

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

TAC KBNon-TAC KB

<query id="EL13_ENG_0015"> <docid>bolt-eng-DF-170-181137-9030298</docid> <name>Lightning Bolts</name> <beg>15959</beg> <end>15973</end></query>

query_id link_idEL13_ENG_0015 NIL0006EL13_ENG_0016 E0273299…EL13_ENG_0821 NIL0006

Page 4: Relational Entity Linking with Cross Document Coreference

4

Entity Linking using Wikification and Cross-Doc Coref

Wikipedia

Arti-cles

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

TAC KBNon-TAC KB

query_id link_idEL13_ENG_0015

NIL0006

EL13_ENG_0016

E0273299

…EL13_ENG_0821

NIL0006

…EL13_ENG_0937

NIL0288

…EL13_ENG_1914

NIL0288

Cross Document

Coreference

Page 5: Relational Entity Linking with Cross Document Coreference

5

Wikification

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

Page 6: Relational Entity Linking with Cross Document Coreference

6

Ambiguity

Concepts outside of KB (NIL) Blumenthal ?

Variability

Scale Millions of labels

Wikification Challenges

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

ConnecticutCT

The Nutmeg StateTimesThe New York TimesThe Times

Page 7: Relational Entity Linking with Cross Document Coreference

7

Key Innovation

Improved Wikification for Structured EL Relational Inference for Linking (Cheng and Roth, EMNLP’13) No retraining

Non-trivial cross-document clustering Best Latent Left-Linking approach (Samdani et al. ’12)

Page 8: Relational Entity Linking with Cross Document Coreference

8

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Evaluation

Page 9: Relational Entity Linking with Cross Document Coreference

9

Entity Linking Architecture

Linking

Wikification

Cross-Doc Coreference

Supervise

LinkingProblem

TAC Query

Preprocessing

Query Normalization Document Transformation

Purposeful Coreference

ReconcileLinking Clusters

Query Mapping

NIL

Wikipedia Entities

TAC KB Entities

Partial NIL Clusters

Page 10: Relational Entity Linking with Cross Document Coreference

10

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Evaluation

Page 11: Relational Entity Linking with Cross Document Coreference

11

Preprocessing

Query normalization Handling spelling mistakes and slangs – one of the reasons we did not

achieve expected performance

In document coreference – some coreferent mentions are easier to link than the query mention

Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a

Page 12: Relational Entity Linking with Cross Document Coreference

12

Preprocessing

Document transformation Document can be as long as 100k characters for a single query Need to truncate documents but minimize the loss of critical contexts

Original

Opening

Query Context

Coreferent Context

Page 13: Relational Entity Linking with Cross Document Coreference

13

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

Page 14: Relational Entity Linking with Cross Document Coreference

14

State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets What is missing?

Wikification Bottleneck

Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.

Page 15: Relational Entity Linking with Cross Document Coreference

15

, the of deposed , …

Motivating Example

Mubarak wife Egyptian President Hosni Mubarak

What are we missing with Bag of Words (BOW) models? Who is Mubarak?

Constraining interaction between concepts (Mubarak, wife, Hosni Mubarak)

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

Page 16: Relational Entity Linking with Cross Document Coreference

16

Relational Inference for Wikification

Our contribution Identify key textual relations for Wikification A global inference framework to incorporate relational knowledge

Significant improvement over state-of-the-art Wikification systems

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

(Mubarak, wife, Hosni Mubarak)

Page 17: Relational Entity Linking with Cross Document Coreference

17

Mention Segmentation

Candidate Generation

Candidate Ranking

NIL Linking

Traditional Wikification Pipeline

Mention Segmentatio

n

Candidate Generation

Candidate Ranking

Determine NILs

Page 18: Relational Entity Linking with Cross Document Coreference

18

Traditional Wikification 1 - Mention Segmentation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Sub noun phrase chunks NER Capitalized phrases

Page 19: Relational Entity Linking with Cross Document Coreference

19

Traditional Wikification 1 - Mention Segmentation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Obtains nested mentions

Page 20: Relational Entity Linking with Cross Document Coreference

20

Traditional Wikification 2 - Candidate Generation

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4

1 Socialist_Party_(France)

2 Socialist_Party_(Portugal)

3 Socialist_Party_of_America

4 Socialist_Party_(Argentina)

k ek3

1 Slobodan_Milošević

2 Milošević_(surname)

3 Boki_Milošević

4 Alexander_Milošević

Approach Collect known mappings from Wikipedia page titles, hyperlinks… Limit to top-K candidates based on frequency of links (Ratinov et al.

2011)

Page 21: Relational Entity Linking with Cross Document Coreference

21

Traditional Wikification 3 - Candidate Ranking

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

Local and global statistical features

Page 22: Relational Entity Linking with Cross Document Coreference

22

Traditional Wikification 4 – Determine NILs

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

k ek3 sk

3

1 Slobodan_Milošević 0.7

Is the top candidate really what the text referred to? Binary classifier

This answer is wrong We did not generate the

correct candidate based on top-K prior

Page 23: Relational Entity Linking with Cross Document Coreference

23

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

Page 24: Relational Entity Linking with Cross Document Coreference

24

Formulation (0)

Intuition Promote pairs of candidate concepts coherent with textual relations

Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …

(Mubarak, wife, Hosni Mubarak)

Page 25: Relational Entity Linking with Cross Document Coreference

25

Formulate as an Integer Linear Program (ILP):

If no relation exists, collapse to the unstructured decision

Formulation (1)

Whether a relation exists between and

weight of a relation

Whether to output th candidate of the th mentionweight to output

Page 26: Relational Entity Linking with Cross Document Coreference

26

Formulation (2)

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

r(1,2)34

eki: whether a concept is chosen

ski : score of a concept

r(k,l)ij: whether a relation is present

w(k,l)ij : score of a relation

r(4,3)34

Page 27: Relational Entity Linking with Cross Document Coreference

27

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

Page 28: Relational Entity Linking with Cross Document Coreference

28

Overall Approach

Relational WikificationCandidate Generation

Candidate Ranking

Determine NILs

Relation Analysis

Relation Identification

Relation Retrieval

Relational Inference

Page 29: Relational Entity Linking with Cross Document Coreference

29

Relation Identification

ACE style in-document coreference (Chang et al. ‘13) Extract named entity-only coreference relations with high precision

Syntactico-Semantic relations (Chan & Roth ‘10)

Easy to extract with high precision Aim for high recall, as false-positives will be filtered Sparse, but covers ~80% relation instances in ACE2004

Type Example

Premodifier Iranian Ministry of Defense

Possessive NYC’s stock exchange

Formulaic Chicago, Illinois

Preposition President of the US

Page 30: Relational Entity Linking with Cross Document Coreference

30

Relation Identification

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Argument 1 Relation Type Argument 2

Yugoslav President apposition Slobodan Milošević

Slobodan Milošević coreference Milošević

Milošević possessive Socialist Party

Page 31: Relational Entity Linking with Cross Document Coreference

31

Overall Approach

Relational WikificationCandidate Generation

Candidate Ranking

Determine NILs

Relation Analysis

Relation Identification

Relation Retrieval

Relational Inference

Page 32: Relational Entity Linking with Cross Document Coreference

32

Relation Retrieval

What concepts can “Socialist Party” refer to? More robust candidate generation

Identified relations are verified against a knowledge base (DBPedia)

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Page 33: Relational Entity Linking with Cross Document Coreference

33

Query Pruning Only 2 queries per pair necessary due to strong baseline.

Relation Retrieval

q1=(Socialist Party of France,?, *Milošević*)q2=(Slobodan Milošević,?,*Socialist Party*)

k ek4 sk

4

1 Socialist_Party_(France) 0.23

k ek3 sk

3

1 Slobodan_Milošević 0.7

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

Page 34: Relational Entity Linking with Cross Document Coreference

34

Relation Retrieval

Argument 1 Relation Type Argument 2

Milošević possessive Socialist Party

Page 35: Relational Entity Linking with Cross Document Coreference

35

Relation Retrieval

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek4 sk

4

1 Socialist_Party_(France) 0.23

2 Socialist_Party_(Portugal) 0.16

3 Socialist_Party_of_America 0.07

4 Socialist_Party_(Argentina) 0.06

21 Socialist_Party_of_Serbia 0.0

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

𝑟34(1,21)=1

Page 36: Relational Entity Linking with Cross Document Coreference

36

Overall Approach

Relational WikificationCandidate Generation

Candidate Ranking

Determine NILs

Relation Analysis

Relation Identification

Relation Retrieval

Relational Inference

Page 37: Relational Entity Linking with Cross Document Coreference

37

Relational Inference - coreference

...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…

k ek2 sk

2

1 Slobodan_Milošević 1.0

k ek3 sk

3

1 Slobodan_Milošević 0.7

2 Milošević_(surname) 0.1

3 Boki_Milošević 0.1

4 Alexander_Milošević 0.05

𝑟23(1,1)=1

Page 38: Relational Entity Linking with Cross Document Coreference

38

Determine unknown concepts (NILs)

How to capture the fact: “Dorothy Byrne” does not refer to any concept in Wikipedia

Identify coreferent nominal mention relations Generate better features for NIL classifier

Dorothy Byrne, a state coordinator for the Florida Green Party,…

k ek2 sk

2

1 Green_Party_of_Florida 1.0

k ek1 sk

1

1 Dorothy_Byrne_(British_Journalist)

0.6

2 Dorothy_Byrne_(mezzo-soprano)

0.4

nominal mention

Page 39: Relational Entity Linking with Cross Document Coreference

39

Determine unknown concepts (NILs)

Create NIL candidate for structured inference e.g. corrects other coreferent “Dorothy” later in the document

Dorothy Byrne, a state coordinator for the Florida Green Party,…

k ek2 sk

2

1 Green_Party_of_Florida 1.0

k ek1 sk

1

0 NIL 1.0

1 Dorothy_Byrne_(British_Journalist)

0.6

2 Dorothy_Byrne_(mezzo-soprano)

0.4

nominal mention

Page 40: Relational Entity Linking with Cross Document Coreference

40

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

Page 41: Relational Entity Linking with Cross Document Coreference

41

Cross Document Coreference

NILs can be viewed as KB entries with partial information A uniform model for entity representation Shared features with Entity Linking system Can be supervised using existing EL systems

Cross document coreference cluster example:Naomi Campbell to give evidence at Charles Taylor trial: spokeswoman.

Supermodel Campbell says 'nothing to gain' from Taylor trial testimony.

Page 42: Relational Entity Linking with Cross Document Coreference

42

Cross Document Coreference Approach

Run document-level coreference Aggregate all features in a document-level coreferent cluster Use both mention-level features and document-level features

String similarity features (NESim, Do et al. ‘09) Context TF-IDF similarity features Document-level cluster features

Training: using both TAC data and Wikifier generated data

Page 43: Relational Entity Linking with Cross Document Coreference

43

100%75%

50%25%

0%

00.10.20.30.40.50.60.70.80.9

1

Clustering B³ F1 on TAC Entity Linking 2012

Trivial L³M No str feat. L³MAllLink SumLink

Page 44: Relational Entity Linking with Cross Document Coreference

44

100%75%

50%25%

0%

00.10.20.30.40.50.60.70.80.9

1

Clustering B³ F1 on Wikifier generated data

Trivial L³M No str feat. L³MAllLink SumLink

Page 45: Relational Entity Linking with Cross Document Coreference

45

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

Page 46: Relational Entity Linking with Cross Document Coreference

46

Query mapping reconciliation

• Max• {0.8,0.7,0.2} = Seattle Seahawks

• Sum• {0.8,0.7+0.2} = Seattle

• No Threshold• NIL classifier always outputs “non-NIL”• Same as Max otherwise

Seattle (0.7)

Seattle Seahawks (0.8)

Seattle(0.2)

[Seattle] has won…

[Seattle] Seahawks ended the game…

… cheered for [Seattle]…

Page 47: Relational Entity Linking with Cross Document Coreference

47

Talk Outline

Introduction Architecture

Entity Linking Approach Preprocessing Wikification

Formulation Relational Analysis

Cross Document Coreference Reconciliation

Evaluation

Page 48: Relational Entity Linking with Cross Document Coreference

48

Evaluation – TAC KBP 2011 Entity Linking

Run Relational Inference (RI) Wikifier “as-is”: No retraining using TAC data

68

72

76

80

84

88

RI

CogComp

LCC

MS_MLI

NUShim

e

*Median

TAC KBP 2011 Entity Linking Performance

Micro AverageB³F1

System Names*Median of top 14 systems

Page 49: Relational Entity Linking with Cross Document Coreference

49

Evaluation – TAC 2012 Entity Linking Error Analysis

Max Sum Perfect Ranking Exact Max0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

B³3 F1

B³3 F1

Page 50: Relational Entity Linking with Cross Document Coreference

50

Official 2013 Performance

Page 51: Relational Entity Linking with Cross Document Coreference

51

Official 2013 Performance Break-down: Link Type

Page 52: Relational Entity Linking with Cross Document Coreference

52

Official 2013 Performance Break-down: Doc domain

Page 53: Relational Entity Linking with Cross Document Coreference

53

Official 2013 Performance Break-down: NER type

Page 54: Relational Entity Linking with Cross Document Coreference

54

Conclusion

Importance of linguistic and world knowledge Identification of relational information benefits Wikification

and Entity Linking Future work

Robust preprocessing on noisy input/adapt to EL task requirement “Self-supervision” on NIL clustering Unified NIL and KB entity representation Joint entity typing, coreference and disambiguation Incorporate more relations

Demo: http://cogcomp.cs.illinois.edu/demo/wikify Download: http://cogcomp.cs.illinois.edu/page/download_view/Wikifier

Thank you!

Page 55: Relational Entity Linking with Cross Document Coreference

55

BACK UP SLIDESBack up slides

Page 56: Relational Entity Linking with Cross Document Coreference

56

Applications

Knowledge Acquisition via Grounding Coreference Resolution

Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012)

Information Extraction Unsupervised relation discovery with sense disambiguation (Yao et al.

2012) Automatic Event Extraction with Structured Preference Modeling (Lu

and Roth, 2012 ) Text Classification

Gabrilovich and Markovitch, 2007; Chang et al., 2008

Page 57: Relational Entity Linking with Cross Document Coreference

57

Wikification Performance Result

ACE MSNBC AQUAINT Wikipedia60

65

70

75

80

85

90

95

F1 Performance on Wikification datasets

Milne&WittenRatinov&RothRelational Inference