Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel...
-
Upload
robert-ogles -
Category
Documents
-
view
218 -
download
0
Transcript of Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel...
Named Entity Disambiguation using Linked Data
Danica DamljanovićThe University of Sheffield
Brunel University London, 05 March 2012
Named Entity Disambiguation in TrendMiner
Newswire
Market data
Polls
…
MultilingualText Processing(EN, DE, IT, BG,
HI)
Time-SeriesMachine
Learning models
Cross-Lingual Summarisation
Knowledge-based Search and Browse
TrendMiner PlatformFinancial Decisions
Political Analysis
Named Entity Recognition is the first step: and it is important to get it right!
Hardik Fintrade Pvt. Ltd.
SORA
Eurokleis srl
Example
Linked Data
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Why DBpedia?
Regularly updated (from Wikipedia) Good source for named entities A hierarchy of concepts
a capital is also a city, but not vice versa Relations between concepts
Paris locatedIn France ParisHilton bornIn NewYorkCity
Named Entity Recognition
ANNIE Produces NE types such as Organization, Location
and Person Resolves coreference
Entities with the same meaning are linked E.g. General Motors and GM
Entity Linking
The Large Knowledge Gazetteer (LKB) Matches text against URIs
Match only against the values of The rdf:label and foaf:name properties For all instances of the classes:
dbpedia-ont:Person dbpedia-ont:Organisation dbpedia-ont:Place classes.
So, why not just combine them?
NE types generated by ANNIE miss the URI LKB does not use any context
Spurious entities E.g. each letter B is annotated as a possible
mention of dbpedia:B_%28Los_Angeles Railway%29
Refers to a line called B operated by Los Angeles Railway
How to filter out the noise?
Identify NEs (Location, Organisation and Person) using ANNIE
For each NE add URIs of matching instances from DBpedia
For each ambiguous NE calculate disambiguation scores
Remove all matches except the highest scoring one
Disambiguation score
Uses context A weighted sum of the three similarity metrics
String similarity Structural similarity Contextual similarity
String similarity
Refers to the edit distance between the text string, and the labels matching URIs
Paris and Paris Hilton
Levenshtein: 0.4166667
Jaccard: 0.5
MongeElcan: 1.0
Paris and Paris, Ontario
Levenshtein: 0.35714287
Jaccard: 0.0
MongeElcan: 1.0
Paris Hilton and Paris, Ontario
Levenshtein: 0.4285714
JaccardSimilarity: 0.0
MongeElcan: 0.6333333
Structural similarity
Is there a relation between the ambiguous NE and any other NE from the same sentence or document?
Paris....France >> true (Paris capitalOf France) Paris...New York>>true (ParisHilton bornIn
NewYorkCity)
Contextual similarity
The probability that two words appear with a similar set of other words (Random Indexing)
Paris France Paris Ontario Paris Hilton
0.9999999:paris0.3674829:métro0.356694:paul-martin0.34328446:lewden0.33907568:pimpfen0.33907568:théas0.33907568:werfft0.33907568:birmoverse0.33907568:cszhech0.330207:pierre
0.6818793:paris0.6818793:ontario0.5707274:merrickville-wolford0.5707274:naiscoutaing0.5707274:neguaguon0.5707274:magnetewan0.5707274:wabauskang0.5679094:tp0.5468101:s-e0.42145208:henvey
0.7042532:hilton0.70425296:paris0.2825679:poverty-related0.276114:jaumont0.276114:jaune-montagne0.276114:malancourt-la-montagne0.26384133:mons–january0.26142785:métro0.26125407:tank-tread0.26125407:“plane’s
Evaluation
Precision Recall f-measure
LKB 0.03 0.86 0.05
LKB+ANNIE 0.14 0.81 0.24
LKB+ANNIE+Disambiguation 0.66 0.75 0.70
100 Wikipedia user profiles manually annotated
Conclusion
Linked Data as an additional knowledge source for resolving context eliminated a large number of incorrect annotations
Thank You!
Questions?
More about the project:http://www.trendminer-project.eu
Contact: