Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for...

20
Knowledge Gaps for Entity Linking Heng Ji

Transcript of Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for...

Page 1: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Knowledge Gaps for Entity Linking

Heng Ji

Page 2: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

2

Outline

• Relation Clustering• Remaining Challenges for Entity Linking

Page 3: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

3

Relation Clustering: The paper from ten

years ago (Hasegawa et al., 2014)• Relation Clustering• Remaining Challenges for Entity Linking

Page 4: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Relation Discovery Overview

• Assume that pairs of entities occurring in similar context can be clustered and each pair in a cluster is an instance of the relation.o 1. Tag NE in text corporao 2. Get co-occurrence pairs of NE and their contexto 3. Measure context similarities among pairs of NEs.o 4. Make clusters of pairs of NEs.o 5. Label each cluster of pairs of NEs.

• Run NE tagger, get all context words within a certain distance; if context words of A-B and C-D pair are similar, these two pairs are placed into the same cluster(the same relation), in this case the relation is merger and acquisition.

Page 5: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Relation Discovery

Page 6: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Relation Discovery

• NE tagging use the extended NE tagger(Sekine, 2001) to detect useful relations.

• Collect intervening words between two NEs for each co-occurrence.o Two NEs are considered to co-occur if they appear within the

same sentence and separated by at most N intervening words.

o Different orders are considered as different contexts. That is, e1…e2 and e2…e1 are collected as different contexts.

o Passive voice: collect the base forms of words which are stemmed by a POS tagger, but verb past participles are distinguished from other verb forms.

• Less frequent pairs of NEs should be eliminated.o Set a frequency threshold

Page 7: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Relation Discovery

• Calculate similarity between the set of contexts of NE pairs.o Vector space model and cosine similarityo Only compare NE pairs which have the same types, e.g., one

PERSON-GPE pair and another PERSON-GPE pair.o Eliminate stop words, words in parallel expressions, and

expressions peculiar to particular source documents.• A context vector for each NE pair consists of the bag of words formed

from all intervening words from all co-occurrences of two NEs.o Different orders: if a word wi occurred L times in e1…e2, M times

in e2…e1, the tfi of wi is defined as L-M.o If the norm |α| is small due to the lack of context words, the

similarity might be unreliable, so define a threshold to eliminate short context vectors.

Page 8: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Relation Discovery

• We can cluster the NE pairs base on the similarity among context vectors of them.o We do not know the # of clusters in advance so we adopt

hierarchical clustering.o Using complete linkage

• Label the cluster with the most frequent word in all combinations of the NE pairs in the same cluster.o The frequencies are normalized.

Page 9: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Discussions

• How will embeddings play a role here?

• Did/Will we make fundamental changes to this “old” framework?

Page 10: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

10

Outline

• Relation Clustering• Remaining Challenges for Entity Linking (10% Errors for News

and 15% Errors for Social Media)

Page 11: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

11

Entity Linking

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.

Page 12: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Knowledge Gap between Source and

KBSource: breaking news/new

information/rumorKB: bio, summary, snapshot of life

According to Darwin it is the Males who do the vamping.

Charles Robert Darwin, was an English naturalist and geologist best known for his contributions to evolutionary theory.

I had no idea the victim in the Jackson cases was publicized.

In the summer of 1993, Jackson was accused of child sexual abuse by a 13-year-old boy named Jordan Chandler and his father, Dr. Evan Chandler, a dentist.

I went to youtube and checked out the Gulf oil crisis: all of the posts are one month old, or older…

On April 20, 2010, the Deepwarter Horizon oil platform, located in the Mississippi Canyon about 40 miles (64 km) off the Louisiana coast, suffered a catastrophic explosion; it sank a day-and-a-half later

12

Page 13: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Fill in the Gap with Background

KnowledgeSource: breaking news/new

information/rumorsKB: bio, summary, snapshot of life

Christies denial of marriage privledges to gays will alienate independents and his “I wanted to have the people vote on it” will ring hollow.

Christie has said that he favoured New Jersey's law allowing same-sex couples to form civil unions, but would veto any bill legalizing same-sex marriage in New Jersey

Translation out of hype-speak: some kook made threatening noises at Brownback and go arrested

Samuel Dale "Sam" Brownback (born September 12, 1956) is an American politician, the 46th and current Governor of Kansas.

Connect/SortBackground Knowledge

Man Accused Of Making Threatening

Phone Call To Kansas Gov. Sam Brownback May Face Felony Charge

13

Page 14: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

The Stockholm Institute stated that 23 of 25 major armed conflicts in the world in 2000 occurred in impoverished nations.

Knowledge Synthesis

Stockholm_International_Peace_Research_Institute

Stockholm_Institute_of_Education

14

Page 15: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

They passed a bill, and Christie the Hutt decides he's stull sucking up to be RomBot's running mate.

Morphs

Chris Christie Mitt Romney

15

They passed a bill, and Christie the Hutt decides he's stull sucking up to be RomBot's running mate.

Page 16: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

2008-07-26During talks in Geneva attended by William J. Burns Iran refused to respond to Solana’s offers.

Commonsense Knowledge

William_J._Burns (1861-1932) William_Joseph_Burns (1956- )

16

Page 17: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

The petition demanded the introduction of a parliament elected by all adults - men and women in Saudi Arabia.

Commonsense Knowledge

Consultative Assembly of Saudi_Arabia

17

Page 18: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

Millions of Americans went to war for America, and came back broken or otherwise gave up a lot, and now we look to take a huge chunk of their hide because Washington no longer works.

World Knowledge

Federal government of the United States

18

Page 19: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

19

• Entity mentions involved in AMR conjunction relations should be linked jointly to KB; their candidates in KB should also be strongly connected to each other with high semantic relatednesso “and”, “or”, “contrast-01”, “either”, “compared to”, “prep

along with”, “neither”, “slash”, “between” and “both”

Collective Inference: What We've done Before (Pan et

al., 2015)

Page 20: Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for Entity Linking.

I think Mitt drops out...Ok, my answer is no one and Obama wins the GE.

Collective Inference: Beyond Sentence and Beyond

Syntax

20