Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel...

16
Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012

Transcript of Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel...

Page 1: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Named Entity Disambiguation using Linked Data

Danica DamljanovićThe University of Sheffield

Brunel University London, 05 March 2012

Page 2: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Named Entity Disambiguation in TrendMiner

Newswire

Market data

Polls

MultilingualText Processing(EN, DE, IT, BG,

HI)

Time-SeriesMachine

Learning models

Cross-Lingual Summarisation

Knowledge-based Search and Browse

TrendMiner PlatformFinancial Decisions

Political Analysis

Named Entity Recognition is the first step: and it is important to get it right!

Hardik Fintrade Pvt. Ltd.

SORA

Eurokleis srl

Page 3: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Example

Page 4: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Linked Data

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 5: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Why DBpedia?

Regularly updated (from Wikipedia) Good source for named entities A hierarchy of concepts

a capital is also a city, but not vice versa Relations between concepts

Paris locatedIn France ParisHilton bornIn NewYorkCity

Page 6: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Named Entity Recognition

ANNIE Produces NE types such as Organization, Location

and Person Resolves coreference

Entities with the same meaning are linked E.g. General Motors and GM

Page 7: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Entity Linking

The Large Knowledge Gazetteer (LKB) Matches text against URIs

Match only against the values of The rdf:label and foaf:name properties For all instances of the classes:

dbpedia-ont:Person dbpedia-ont:Organisation dbpedia-ont:Place classes.

Page 8: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

So, why not just combine them?

NE types generated by ANNIE miss the URI LKB does not use any context

Spurious entities E.g. each letter B is annotated as a possible

mention of dbpedia:B_%28Los_Angeles Railway%29

Refers to a line called B operated by Los Angeles Railway

Page 9: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

How to filter out the noise?

Identify NEs (Location, Organisation and Person) using ANNIE

For each NE add URIs of matching instances from DBpedia

For each ambiguous NE calculate disambiguation scores

Remove all matches except the highest scoring one

Page 10: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Disambiguation score

Uses context A weighted sum of the three similarity metrics

String similarity Structural similarity Contextual similarity

Page 11: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

String similarity

Refers to the edit distance between the text string, and the labels matching URIs

Paris and Paris Hilton

Levenshtein: 0.4166667

Jaccard: 0.5

MongeElcan: 1.0

Paris and Paris, Ontario

Levenshtein: 0.35714287

Jaccard: 0.0

MongeElcan: 1.0

Paris Hilton and Paris, Ontario

Levenshtein: 0.4285714

JaccardSimilarity: 0.0

MongeElcan: 0.6333333

Page 12: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Structural similarity

Is there a relation between the ambiguous NE and any other NE from the same sentence or document?

Paris....France >> true (Paris capitalOf France) Paris...New York>>true (ParisHilton bornIn

NewYorkCity)

Page 13: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Contextual similarity

The probability that two words appear with a similar set of other words (Random Indexing)

Paris France Paris Ontario Paris Hilton

0.9999999:paris0.3674829:métro0.356694:paul-martin0.34328446:lewden0.33907568:pimpfen0.33907568:théas0.33907568:werfft0.33907568:birmoverse0.33907568:cszhech0.330207:pierre

0.6818793:paris0.6818793:ontario0.5707274:merrickville-wolford0.5707274:naiscoutaing0.5707274:neguaguon0.5707274:magnetewan0.5707274:wabauskang0.5679094:tp0.5468101:s-e0.42145208:henvey

0.7042532:hilton0.70425296:paris0.2825679:poverty-related0.276114:jaumont0.276114:jaune-montagne0.276114:malancourt-la-montagne0.26384133:mons–january0.26142785:métro0.26125407:tank-tread0.26125407:“plane’s

Page 14: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Evaluation

Precision Recall f-measure

LKB 0.03 0.86 0.05

LKB+ANNIE 0.14 0.81 0.24

LKB+ANNIE+Disambiguation 0.66 0.75 0.70

100 Wikipedia user profiles manually annotated

Page 15: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Conclusion

Linked Data as an additional knowledge source for resolving context eliminated a large number of incorrect annotations

Page 16: Named Entity Disambiguation using Linked Data Danica Damljanović The University of Sheffield Brunel University London, 05 March 2012.

Thank You!

Questions?

More about the project:http://www.trendminer-project.eu

Contact:

[email protected]