Relational Entity Linking with Cross Document Coreference
description
Transcript of Relational Entity Linking with Cross Document Coreference
1
Relational Entity Linking with Cross Document Coreference
Xiao Cheng, Bingling Chen, Rajhans Samdani,Kai-Wei Chang, Zhiye Fei and Dan Roth
University of Illinois at Urbana-Champaign (UI_CCG)
2
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Reconciliation
Evaluation
3
Entity Linking Specification
Query
Output
Wikipedia
Arti-cles
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
TAC KBNon-TAC KB
<query id="EL13_ENG_0015"> <docid>bolt-eng-DF-170-181137-9030298</docid> <name>Lightning Bolts</name> <beg>15959</beg> <end>15973</end></query>
query_id link_idEL13_ENG_0015 NIL0006EL13_ENG_0016 E0273299…EL13_ENG_0821 NIL0006
4
Entity Linking using Wikification and Cross-Doc Coref
Wikipedia
Arti-cles
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
TAC KBNon-TAC KB
query_id link_idEL13_ENG_0015
NIL0006
EL13_ENG_0016
E0273299
…EL13_ENG_0821
NIL0006
…EL13_ENG_0937
NIL0288
…EL13_ENG_1914
NIL0288
Cross Document
Coreference
5
Wikification
Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
6
Ambiguity
Concepts outside of KB (NIL) Blumenthal ?
Variability
Scale Millions of labels
Wikification Challenges
Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
ConnecticutCT
The Nutmeg StateTimesThe New York TimesThe Times
7
Key Innovation
Improved Wikification for Structured EL Relational Inference for Linking (Cheng and Roth, EMNLP’13) No retraining
Non-trivial cross-document clustering Best Latent Left-Linking approach (Samdani et al. ’12)
8
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Evaluation
9
Entity Linking Architecture
Linking
Wikification
Cross-Doc Coreference
Supervise
LinkingProblem
TAC Query
Preprocessing
Query Normalization Document Transformation
Purposeful Coreference
ReconcileLinking Clusters
Query Mapping
NIL
Wikipedia Entities
TAC KB Entities
Partial NIL Clusters
10
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Evaluation
11
Preprocessing
Query normalization Handling spelling mistakes and slangs – one of the reasons we did not
achieve expected performance
In document coreference – some coreferent mentions are easier to link than the query mention
Obomber, Obamadinejad, Osama Obama, Nobama, Obambi, Obamination, ObaMao, Owe Bama, 0bama, O-balm-a, O-bomb-a
12
Preprocessing
Document transformation Document can be as long as 100k characters for a single query Need to truncate documents but minimize the loss of critical contexts
Original
Opening
Query Context
Coreferent Context
13
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Reconciliation
Evaluation
14
State-of-the-art Wikification systems (Ratinov et al. 2011) can achieve the above with local and global statistical features Reaches bottleneck around 70%~ 85% F1 on non-wiki datasets What is missing?
Wikification Bottleneck
Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State.
15
, the of deposed , …
Motivating Example
Mubarak wife Egyptian President Hosni Mubarak
What are we missing with Bag of Words (BOW) models? Who is Mubarak?
Constraining interaction between concepts (Mubarak, wife, Hosni Mubarak)
Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …
16
Relational Inference for Wikification
Our contribution Identify key textual relations for Wikification A global inference framework to incorporate relational knowledge
Significant improvement over state-of-the-art Wikification systems
Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …
(Mubarak, wife, Hosni Mubarak)
17
Mention Segmentation
Candidate Generation
Candidate Ranking
NIL Linking
Traditional Wikification Pipeline
Mention Segmentatio
n
Candidate Generation
Candidate Ranking
Determine NILs
18
Traditional Wikification 1 - Mention Segmentation
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
Sub noun phrase chunks NER Capitalized phrases
19
Traditional Wikification 1 - Mention Segmentation
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
Obtains nested mentions
20
Traditional Wikification 2 - Candidate Generation
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
k ek4
1 Socialist_Party_(France)
2 Socialist_Party_(Portugal)
3 Socialist_Party_of_America
4 Socialist_Party_(Argentina)
…
k ek3
1 Slobodan_Milošević
2 Milošević_(surname)
3 Boki_Milošević
4 Alexander_Milošević
…
Approach Collect known mappings from Wikipedia page titles, hyperlinks… Limit to top-K candidates based on frequency of links (Ratinov et al.
2011)
21
Traditional Wikification 3 - Candidate Ranking
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
k ek4 sk
4
1 Socialist_Party_(France) 0.23
2 Socialist_Party_(Portugal) 0.16
3 Socialist_Party_of_America 0.07
4 Socialist_Party_(Argentina) 0.06
…
k ek3 sk
3
1 Slobodan_Milošević 0.7
2 Milošević_(surname) 0.1
3 Boki_Milošević 0.1
4 Alexander_Milošević 0.05
…
Local and global statistical features
22
Traditional Wikification 4 – Determine NILs
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
k ek4 sk
4
1 Socialist_Party_(France) 0.23
k ek3 sk
3
1 Slobodan_Milošević 0.7
Is the top candidate really what the text referred to? Binary classifier
This answer is wrong We did not generate the
correct candidate based on top-K prior
23
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Reconciliation
Evaluation
24
Formulation (0)
Intuition Promote pairs of candidate concepts coherent with textual relations
Mubarak, the wife of deposed Egyptian President Hosni Mubarak, …
(Mubarak, wife, Hosni Mubarak)
25
Formulate as an Integer Linear Program (ILP):
If no relation exists, collapse to the unstructured decision
Formulation (1)
Whether a relation exists between and
weight of a relation
Whether to output th candidate of the th mentionweight to output
26
Formulation (2)
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
k ek4 sk
4
1 Socialist_Party_(France) 0.23
2 Socialist_Party_(Portugal) 0.16
3 Socialist_Party_of_America 0.07
4 Socialist_Party_(Argentina) 0.06
…
k ek3 sk
3
1 Slobodan_Milošević 0.7
2 Milošević_(surname) 0.1
3 Boki_Milošević 0.1
4 Alexander_Milošević 0.05
…
r(1,2)34
eki: whether a concept is chosen
ski : score of a concept
r(k,l)ij: whether a relation is present
w(k,l)ij : score of a relation
r(4,3)34
27
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Reconciliation
Evaluation
28
Overall Approach
Relational WikificationCandidate Generation
Candidate Ranking
Determine NILs
Relation Analysis
Relation Identification
Relation Retrieval
Relational Inference
29
Relation Identification
ACE style in-document coreference (Chang et al. ‘13) Extract named entity-only coreference relations with high precision
Syntactico-Semantic relations (Chan & Roth ‘10)
Easy to extract with high precision Aim for high recall, as false-positives will be filtered Sparse, but covers ~80% relation instances in ACE2004
Type Example
Premodifier Iranian Ministry of Defense
Possessive NYC’s stock exchange
Formulaic Chicago, Illinois
Preposition President of the US
30
Relation Identification
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
Argument 1 Relation Type Argument 2
Yugoslav President apposition Slobodan Milošević
Slobodan Milošević coreference Milošević
Milošević possessive Socialist Party
31
Overall Approach
Relational WikificationCandidate Generation
Candidate Ranking
Determine NILs
Relation Analysis
Relation Identification
Relation Retrieval
Relational Inference
32
Relation Retrieval
What concepts can “Socialist Party” refer to? More robust candidate generation
Identified relations are verified against a knowledge base (DBPedia)
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
33
Query Pruning Only 2 queries per pair necessary due to strong baseline.
Relation Retrieval
q1=(Socialist Party of France,?, *Milošević*)q2=(Slobodan Milošević,?,*Socialist Party*)
k ek4 sk
4
1 Socialist_Party_(France) 0.23
…
k ek3 sk
3
1 Slobodan_Milošević 0.7
…
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
34
Relation Retrieval
Argument 1 Relation Type Argument 2
Milošević possessive Socialist Party
35
Relation Retrieval
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
k ek4 sk
4
1 Socialist_Party_(France) 0.23
2 Socialist_Party_(Portugal) 0.16
3 Socialist_Party_of_America 0.07
4 Socialist_Party_(Argentina) 0.06
…
21 Socialist_Party_of_Serbia 0.0
k ek3 sk
3
1 Slobodan_Milošević 0.7
2 Milošević_(surname) 0.1
3 Boki_Milošević 0.1
4 Alexander_Milošević 0.05
…
𝑟34(1,21)=1
36
Overall Approach
Relational WikificationCandidate Generation
Candidate Ranking
Determine NILs
Relation Analysis
Relation Identification
Relation Retrieval
Relational Inference
37
Relational Inference - coreference
...ousted long time Yugoslav President Slobodan Milošević in October. Mr. Milošević's Socialist Party…
k ek2 sk
2
1 Slobodan_Milošević 1.0
k ek3 sk
3
1 Slobodan_Milošević 0.7
2 Milošević_(surname) 0.1
3 Boki_Milošević 0.1
4 Alexander_Milošević 0.05
…
𝑟23(1,1)=1
38
Determine unknown concepts (NILs)
How to capture the fact: “Dorothy Byrne” does not refer to any concept in Wikipedia
Identify coreferent nominal mention relations Generate better features for NIL classifier
Dorothy Byrne, a state coordinator for the Florida Green Party,…
k ek2 sk
2
1 Green_Party_of_Florida 1.0
k ek1 sk
1
1 Dorothy_Byrne_(British_Journalist)
0.6
2 Dorothy_Byrne_(mezzo-soprano)
0.4
nominal mention
39
Determine unknown concepts (NILs)
Create NIL candidate for structured inference e.g. corrects other coreferent “Dorothy” later in the document
Dorothy Byrne, a state coordinator for the Florida Green Party,…
k ek2 sk
2
1 Green_Party_of_Florida 1.0
k ek1 sk
1
0 NIL 1.0
1 Dorothy_Byrne_(British_Journalist)
0.6
2 Dorothy_Byrne_(mezzo-soprano)
0.4
nominal mention
40
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Reconciliation
Evaluation
41
Cross Document Coreference
NILs can be viewed as KB entries with partial information A uniform model for entity representation Shared features with Entity Linking system Can be supervised using existing EL systems
Cross document coreference cluster example:Naomi Campbell to give evidence at Charles Taylor trial: spokeswoman.
Supermodel Campbell says 'nothing to gain' from Taylor trial testimony.
42
Cross Document Coreference Approach
Run document-level coreference Aggregate all features in a document-level coreferent cluster Use both mention-level features and document-level features
String similarity features (NESim, Do et al. ‘09) Context TF-IDF similarity features Document-level cluster features
Training: using both TAC data and Wikifier generated data
43
100%75%
50%25%
0%
00.10.20.30.40.50.60.70.80.9
1
Clustering B³ F1 on TAC Entity Linking 2012
Trivial L³M No str feat. L³MAllLink SumLink
44
100%75%
50%25%
0%
00.10.20.30.40.50.60.70.80.9
1
Clustering B³ F1 on Wikifier generated data
Trivial L³M No str feat. L³MAllLink SumLink
45
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Reconciliation
Evaluation
46
Query mapping reconciliation
• Max• {0.8,0.7,0.2} = Seattle Seahawks
• Sum• {0.8,0.7+0.2} = Seattle
• No Threshold• NIL classifier always outputs “non-NIL”• Same as Max otherwise
Seattle (0.7)
Seattle Seahawks (0.8)
Seattle(0.2)
[Seattle] has won…
[Seattle] Seahawks ended the game…
… cheered for [Seattle]…
47
Talk Outline
Introduction Architecture
Entity Linking Approach Preprocessing Wikification
Formulation Relational Analysis
Cross Document Coreference Reconciliation
Evaluation
48
Evaluation – TAC KBP 2011 Entity Linking
Run Relational Inference (RI) Wikifier “as-is”: No retraining using TAC data
68
72
76
80
84
88
RI
CogComp
LCC
MS_MLI
NUShim
e
*Median
TAC KBP 2011 Entity Linking Performance
Micro AverageB³F1
System Names*Median of top 14 systems
49
Evaluation – TAC 2012 Entity Linking Error Analysis
Max Sum Perfect Ranking Exact Max0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
B³3 F1
B³3 F1
50
Official 2013 Performance
51
Official 2013 Performance Break-down: Link Type
52
Official 2013 Performance Break-down: Doc domain
53
Official 2013 Performance Break-down: NER type
54
Conclusion
Importance of linguistic and world knowledge Identification of relational information benefits Wikification
and Entity Linking Future work
Robust preprocessing on noisy input/adapt to EL task requirement “Self-supervision” on NIL clustering Unified NIL and KB entity representation Joint entity typing, coreference and disambiguation Incorporate more relations
Demo: http://cogcomp.cs.illinois.edu/demo/wikify Download: http://cogcomp.cs.illinois.edu/page/download_view/Wikifier
Thank you!
55
BACK UP SLIDESBack up slides
56
Applications
Knowledge Acquisition via Grounding Coreference Resolution
Learning-based multi-sieve co-reference resolution with knowledge (Ratinov et al. 2012)
Information Extraction Unsupervised relation discovery with sense disambiguation (Yao et al.
2012) Automatic Event Extraction with Structured Preference Modeling (Lu
and Roth, 2012 ) Text Classification
Gabrilovich and Markovitch, 2007; Chang et al., 2008
57
Wikification Performance Result
ACE MSNBC AQUAINT Wikipedia60
65
70
75
80
85
90
95
F1 Performance on Wikification datasets
Milne&WittenRatinov&RothRelational Inference