Entity Extraction, Linking, Classification, and Tagging for Social ...
Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for...
-
Upload
corey-phillips -
Category
Documents
-
view
231 -
download
0
Transcript of Knowledge Gaps for Entity Linking Heng Ji. 2 Outline Relation Clustering Remaining Challenges for...
Knowledge Gaps for Entity Linking
Heng Ji
2
Outline
• Relation Clustering• Remaining Challenges for Entity Linking
3
Relation Clustering: The paper from ten
years ago (Hasegawa et al., 2014)• Relation Clustering• Remaining Challenges for Entity Linking
Relation Discovery Overview
• Assume that pairs of entities occurring in similar context can be clustered and each pair in a cluster is an instance of the relation.o 1. Tag NE in text corporao 2. Get co-occurrence pairs of NE and their contexto 3. Measure context similarities among pairs of NEs.o 4. Make clusters of pairs of NEs.o 5. Label each cluster of pairs of NEs.
• Run NE tagger, get all context words within a certain distance; if context words of A-B and C-D pair are similar, these two pairs are placed into the same cluster(the same relation), in this case the relation is merger and acquisition.
Relation Discovery
Relation Discovery
• NE tagging use the extended NE tagger(Sekine, 2001) to detect useful relations.
• Collect intervening words between two NEs for each co-occurrence.o Two NEs are considered to co-occur if they appear within the
same sentence and separated by at most N intervening words.
o Different orders are considered as different contexts. That is, e1…e2 and e2…e1 are collected as different contexts.
o Passive voice: collect the base forms of words which are stemmed by a POS tagger, but verb past participles are distinguished from other verb forms.
• Less frequent pairs of NEs should be eliminated.o Set a frequency threshold
Relation Discovery
• Calculate similarity between the set of contexts of NE pairs.o Vector space model and cosine similarityo Only compare NE pairs which have the same types, e.g., one
PERSON-GPE pair and another PERSON-GPE pair.o Eliminate stop words, words in parallel expressions, and
expressions peculiar to particular source documents.• A context vector for each NE pair consists of the bag of words formed
from all intervening words from all co-occurrences of two NEs.o Different orders: if a word wi occurred L times in e1…e2, M times
in e2…e1, the tfi of wi is defined as L-M.o If the norm |α| is small due to the lack of context words, the
similarity might be unreliable, so define a threshold to eliminate short context vectors.
Relation Discovery
• We can cluster the NE pairs base on the similarity among context vectors of them.o We do not know the # of clusters in advance so we adopt
hierarchical clustering.o Using complete linkage
• Label the cluster with the most frequent word in all combinations of the NE pairs in the same cluster.o The frequencies are normalized.
Discussions
• How will embeddings play a role here?
• Did/Will we make fundamental changes to this “old” framework?
10
Outline
• Relation Clustering• Remaining Challenges for Entity Linking (10% Errors for News
and 15% Errors for Social Media)
11
Entity Linking
It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.
Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..
Chicago VIII was one of the early 70s-era Chicago albums to catch myear, along with Chicago II.
Knowledge Gap between Source and
KBSource: breaking news/new
information/rumorKB: bio, summary, snapshot of life
According to Darwin it is the Males who do the vamping.
Charles Robert Darwin, was an English naturalist and geologist best known for his contributions to evolutionary theory.
I had no idea the victim in the Jackson cases was publicized.
In the summer of 1993, Jackson was accused of child sexual abuse by a 13-year-old boy named Jordan Chandler and his father, Dr. Evan Chandler, a dentist.
I went to youtube and checked out the Gulf oil crisis: all of the posts are one month old, or older…
On April 20, 2010, the Deepwarter Horizon oil platform, located in the Mississippi Canyon about 40 miles (64 km) off the Louisiana coast, suffered a catastrophic explosion; it sank a day-and-a-half later
12
Fill in the Gap with Background
KnowledgeSource: breaking news/new
information/rumorsKB: bio, summary, snapshot of life
Christies denial of marriage privledges to gays will alienate independents and his “I wanted to have the people vote on it” will ring hollow.
Christie has said that he favoured New Jersey's law allowing same-sex couples to form civil unions, but would veto any bill legalizing same-sex marriage in New Jersey
Translation out of hype-speak: some kook made threatening noises at Brownback and go arrested
Samuel Dale "Sam" Brownback (born September 12, 1956) is an American politician, the 46th and current Governor of Kansas.
Connect/SortBackground Knowledge
Man Accused Of Making Threatening
Phone Call To Kansas Gov. Sam Brownback May Face Felony Charge
13
The Stockholm Institute stated that 23 of 25 major armed conflicts in the world in 2000 occurred in impoverished nations.
Knowledge Synthesis
Stockholm_International_Peace_Research_Institute
Stockholm_Institute_of_Education
14
They passed a bill, and Christie the Hutt decides he's stull sucking up to be RomBot's running mate.
Morphs
Chris Christie Mitt Romney
15
They passed a bill, and Christie the Hutt decides he's stull sucking up to be RomBot's running mate.
2008-07-26During talks in Geneva attended by William J. Burns Iran refused to respond to Solana’s offers.
Commonsense Knowledge
William_J._Burns (1861-1932) William_Joseph_Burns (1956- )
16
The petition demanded the introduction of a parliament elected by all adults - men and women in Saudi Arabia.
Commonsense Knowledge
Consultative Assembly of Saudi_Arabia
17
Millions of Americans went to war for America, and came back broken or otherwise gave up a lot, and now we look to take a huge chunk of their hide because Washington no longer works.
World Knowledge
Federal government of the United States
18
19
• Entity mentions involved in AMR conjunction relations should be linked jointly to KB; their candidates in KB should also be strongly connected to each other with high semantic relatednesso “and”, “or”, “contrast-01”, “either”, “compared to”, “prep
along with”, “neither”, “slash”, “between” and “both”
Collective Inference: What We've done Before (Pan et
al., 2015)
I think Mitt drops out...Ok, my answer is no one and Obama wins the GE.
Collective Inference: Beyond Sentence and Beyond
Syntax
20