LOD2 Webinar Series: DBpedia Spotlight
-
Upload
lod2-creating-knowledge-out-of-interlinked-data -
Category
Documents
-
view
756 -
download
2
description
Transcript of LOD2 Webinar Series: DBpedia Spotlight
LOD2 Webinar . 26.02.2013 . Page 1 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu
LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. This 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers. Coming from across 12 countries the partners are coordinated by the Agile Knowledge Engineering and Semantic Web Research Group at the University of Leipzig, Germany.
LOD2 will integrate and syndicate Linked Data with existing large-scale applications. The project shows the benefits in the scenarios of Media and Publishing, Corporate Data intranets and eGovernment.
LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu
Once per month the LOD2 webinar series offer a free webinar about tools and services along the Linked Open Data Life Cycle.
Stay with us and learn more about acquisition, editing, composing, connected applications – and finally publishing Linked Open Data.
LOD2 Webinar . 26.02.2013. Page 4 http://lod2.eu
Creating Knowledge out of Interlinked Data
Agenda
Profiles: Pablo N Mendes and the DBpedia Spotlight team
Linked Data life cycle and role of DBpedia Spotlight within LOD2
What is DBpedia Spotlight
Demonstration
Lessons Learned and Next steps
Q&A
LOD2 Webinar . 26.02.2013. Page 5 http://lod2.eu
Creating Knowledge out of Interlinked Data
Pablo N. Mendes and the DBpedia Spotlight team
Pablo N. MendesResearch Associate at the
Open Knowledge Foundation, Germanyhttp://okfn.de
Interests: - Information Extraction, Integration,
Retrieval and Exploration More info:http://pablomendes.com
ContributorsSandro Coelho (BS student at UFJF, Brazil)Chris Hokamp (PhD student at University of North Texas, USA)Dirk Weissenborn (MS student at University of Dresden, Germany)Liu Zhengzhong (now PhD student at Carnegie Mellon University, USA)Marcus Nitschke (student at U. Leipzig)...Full list on GitHub.
Co-maintainersMax Jakob (Neofonie Gmbh)Joachim Daiber (MS student at the Rijksuniversiteit Groningen)
FundingLOD2, DICODE, Google Summer of Code 2012, IKS
HostingU.Mannheim, MTA SZTAKI, Globo.com, RNP.br
LOD2 Webinar . 26.02.2013. Page 6 http://lod2.eu
Creating Knowledge out of Interlinked Data
Linked Data Life Cycle
Classification Enrichment
Quality Analysis
Evolution Repair
Search Browsing
Exploration
Extraction
Storage Querying
Manual revision
authoring
Interlinking Fusing
LOD2 Webinar . 26.02.2013. Page 7 http://lod2.eu
Creating Knowledge out of Interlinked Data
Linked Data Life Cycle
Classification Enrichment
Quality Analysis
Evolution Repair
Search Browsing
Exploration
Extraction
Storage Querying
Manual revision
authoring
Interlinking Fusing
LOD2 Webinar . 26.02.2013. Page 8 http://lod2.eu
Creating Knowledge out of Interlinked Data
Shedding Light on the Web of Documents
LOD2 Webinar . 26.02.2013. Page 9 http://lod2.eu
Creating Knowledge out of Interlinked Data
Named Entity Recognition/Disambiguation• Automatically put Wikipedia links to (plain) text.
LOD2 Webinar . 26.02.2013. Page 10 http://lod2.eu
Creating Knowledge out of Interlinked Data
Named Entity Recognition/Disambiguation• Automatically put Wikipedia links to (plain) text.
• 1. Recognition: find „interesting“ strings• surface forms
LOD2 Webinar . 26.02.2013. Page 11 http://lod2.eu
Creating Knowledge out of Interlinked Data
Named Entity Recognition/Disambiguation• Automatically put Wikipedia links to (plain) text.
• 1. Recognition: find „interesting“ strings• surface forms
LOD2 Webinar . 26.02.2013. Page 12 http://lod2.eu
Creating Knowledge out of Interlinked Data
Named Entity Recognition/Disambiguation
• Automatically put Wikipedia links to (plain) text.
• 1. Recognition: find „interesting“ strings• surface forms
• 2. Disambiguation: choose appropriate Wikipedia page• Each Wikipedia page represents an entity
• Every surface form can have multiple candidate entities for linking
LOD2 Webinar . 26.02.2013. Page 13 http://lod2.eu
Creating Knowledge out of Interlinked Data
Michael Jackson died in 2007.
LOD2 Webinar . 26.02.2013. Page 14 http://lod2.eu
Creating Knowledge out of Interlinked Data
Michael Jackson died in 2007.
• Recognition: Find surface forms
LOD2 Webinar . 26.02.2013. Page 15 http://lod2.eu
Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.
• Recognition: Find surface forms
LOD2 Webinar . 26.02.2013. Page 16 http://lod2.eu
Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.
• Disambiguation: Choose correct entity
LOD2 Webinar . 26.02.2013. Page 17 http://lod2.eu
Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.
• Disambiguation: Choose correct entity• Candidates for [Michael Jackson]
LOD2 Webinar . 26.02.2013. Page 18 http://lod2.eu
Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
LOD2 Webinar . 26.02.2013. Page 19 http://lod2.eu
Creating Knowledge out of Interlinked Data
[Michael Jackson] died in 2007.• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
context
LOD2 Webinar . 26.02.2013. Page 20 http://lod2.eu
Creating Knowledge out of Interlinked Data
[Michael Jackson] came to Paris.• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
Singer Journalist
less distinctive context
LOD2 Webinar . 26.02.2013. Page 21 http://lod2.eu
Creating Knowledge out of Interlinked Data
[Michael Jackson] came to Paris.• Disambiguation: Choose correct entity
• Candidates for [Michael Jackson]
Singer Journalist
less distinctive context
LOD2 Webinar . 26.02.2013. Page 22 http://lod2.eu
Creating Knowledge out of Interlinked Data
Probabilities
• P(entity | surface form)
• Who is typically meant by a name?
• For example, given [Michael Jackson] (and ignoring the context), what are the probabilities of the candidates?
• Michael J ackson (singer) 0.98
• Michael J ackson (journalist) 0.02
• Other useful probabilities:• P(surface form | entity), P(entity), P(surface form)
• Estimate Maximum Likelihood using Wikipedia page links
LOD2 Webinar . 26.02.2013. Page 23 http://lod2.eu
Creating Knowledge out of Interlinked Data
Data Processing• Two pipelines
− Single machine with Scala− MapReduce-style with Apache Pig
• Apache Pig for analyzing large datasets on top of Hadoop
− Data-flow language
− Think in tuples, bags and maps
− load, filter, join, group by, store, …
− from which Pig derives a MapReduce plan− We build on pignlproc , started by Olivier Grisel (Stanbol)
LOD2 Webinar . 26.02.2013. Page 24 http://lod2.eu
Creating Knowledge out of Interlinked Data
Probability estimation
• P( entity | surface form ) =
• P( Michael J ackson (singer) | Michael J ackson) = 0.98
• P( Michael J ackson (journalist) | Michael J ackson) = 0.02
• Check the project web for estimation of other scores
– Other probabilities...
– TF*ICF (modification of TF*IDF) and others...
count( surface form, entity )
count( surface form )
LOD2 Webinar . 26.02.2013. Page 25 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Webinar . 26.02.2013. Page 26 http://lod2.eu
Creating Knowledge out of Interlinked Data
Annotate
http://dbpedia.org/resource/LSU_Tigers
LOD2 Webinar . 26.02.2013. Page 27 http://lod2.eu
Creating Knowledge out of Interlinked Data
Annotate
http://dbpedia.org/resource/LSU_Tigers
http://dbpedia.org/resource/No. 4 (album)
LOD2 Webinar . 26.02.2013. Page 28 http://lod2.eu
Creating Knowledge out of Interlinked Data
Top K Candidates
LSU_Tigers
Louisiana State University
LOD2 Webinar . 26.02.2013. Page 29 http://lod2.eu
Creating Knowledge out of Interlinked Data
Demo:– http://spotlight.dbpedia.org/demo/
Web Service:– http://spotlight.dbpedia.org/rest/{API}– APIs:
• Phrase Recognition (/spot), Disambiguation (/disambiguation)• Top K disambiguations (/candidates)• Annotation (/annotation)
Source code:– https://github.com/dbpedia-spotlight/dbpedia-spotlight/
Apache V2 License!
LOD2 Webinar . 26.02.2013. Page 30 http://lod2.eu
Creating Knowledge out of Interlinked Data
Lessons learned
A generic solution to the problem is tough– Most of the research focuses on solving very specialized cases– Some entity types are harder than others– Some types of text are harder than others
Yet, users expect it to “just work”.
We are focusing on a generic core that can be easily customized.
LOD2 Webinar . 26.02.2013. Page 31 http://lod2.eu
Creating Knowledge out of Interlinked Data
Next steps
More experiments with DBpedia Spotlight in the context of LOD2 Use Case packages: Wolters Kluwer (legal domain, German language), Emergency Response,
Automating build process and release to LOD2 Stack
Expanding to other languages
Easier adaptation to other knowledge bases beyond DBpedia
New algorithms, collective disambiguation, etc.
LOD2 Webinar . 26.02.2013 . Page 32 http://lod2.eu
Creating Knowledge out of Interlinked Data
Credits
Jingle R.E.M., Martin Kaltenböck, Florian Kondert
Coordination Thomas Thurner
Martin Kaltenböck
Moderation Martin Kaltenböck
Presented by Pablo N. Mendes
Slides from Pablo N. Mendes, Max Jakob, Joachim Daiber
LOD2 Webinar . 29.11.2011 . Page 33 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu
Hope you enjoyed staying with us – if you need more detailed information, visit us at www.lod2.eu and let us know how we can improve to meet your expectations!
Don’t forget to register for our next webinar
27.03.2013 – CKAN and PublicData.eu (OKFN) April – Vituoso 7 (Openlink Software)
Have a great day and don’t forget ...
LOD2 Webinar . 29.11.2011 . Page 34 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu