Dbpedia Estrazione di conoscenza da wikipediacena/materiale/BusinessIntelligence/DBpedia.pdf ·...
Transcript of Dbpedia Estrazione di conoscenza da wikipediacena/materiale/BusinessIntelligence/DBpedia.pdf ·...
03/01/18
1
DbpediaEstrazione di conoscenza da wikipedia
Wikipedia
2 http://dbpedia.orgDBpediaTutorial09.02.2015
Wikipedia articles:–4,7 mil. Articles; 780 article additions per day–are highly topical–containing only few errors, which can easily be
revised–cover often very specific content
→ Wikipedia is the knowledge compendium of humanity.
benefits of using Linked Data
Consumer View-link data from any other place in the web-discover more related data while consuming data-reuse parts of the data-reuse existing tools and libraries-combine data safely with other data-query data over different repositories
Publisher View-make your data discoverable-increase the value of your data (by linking it)-have fine-granular control over the
data items and optimise their access-design data to fit your domain knowledge
3 http://dbpedia.orgDBpediaTutorial09.02.2015
What's DBpedia?
– DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web.
– DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web toWikipedia data.
4 http://dbpedia.orgDBpediaTutorial09.02.2015
03/01/18
2
What's DBpedia?
5 http://dbpedia.orgDBpediaTutorial09.02.2015
–DBpedia project was started in 2006–has been a key factor for the success ofthe
Linked Open Data initiative– serves as an interlinking hub for other data
sets–DBpedia provides a testbed serving real data
spanning various domains–In more than 120 language editions
Where is Wikipedia information useful?
6 http://dbpedia.orgDBpediaTutorial09.02.2015
„Which films starred John Cleese without any other members of Monty Python?“
„What have Dublin and Leipzig in common?“
„Which Software products are developed by an organisation founded in California?“
„Which populated places in Germany are below sea level?“
Where is Wikipediainformation useful?
● as terminology and concept repository and fact source for Entity Linking and Disambiguation:
The series follows the adventures of a space-faring crew onboard the starship USS Enterprise (NCC-1701-D), the fifth Federation vessel to bear the nameand registry and the seventh starship bythat name
The Enterprise is commanded by Captain Jean-Luc Picard and is staffed by first officer Commander William Riker, operations manager Data, security chief Tasha Yar, ship's counselor Deanna Troi, chief medical officer Dr. Beverly Crusher, connofficer Lieutenant Geordi La Forge, and junior officer Lieutenant Worf.
⇒ no company, no aircraft carrier, no satellite⇒ correlate the mentionings and concept starship
7 http://dbpedia.orgDBpediaTutorial09.02.2015
⇒ Star Trek rank, contemporary or past military or law enforcement
Why search engines aren'talways enough
„Which films starred John Cleese without any other members of Monty Python?“
8 http://dbpedia.orgDBpediaTutorial09.02.2015
03/01/18
3
http://dbpedia.orgDBpediaTutorial09.02.2015
9 http://dbpedia.orgDBpediaTutorial09.02.2015
10
What is needed to do better?
11 http://dbpedia.orgDBpediaTutorial09.02.2015
● ontological represantation of entities and facts
„An ontology is a specification of a conceptualization.“ (Gruber, 1993)
⇒ formal description of concepts and relationships
What is needed to do better?
12 http://dbpedia.orgDBpediaTutorial09.02.2015
● ontological represantation of entities and facts
● well-defined taxonomy of entity types● assertions about entities in and their relations
A British Comedy is a kind of Comedy. A Comedy is a kind of Film.A British Comedy is a kind of Film.Clockwise is a British Comedy. John Cleese starsClockwise.John Cleese stars a Film.
● thoroughly specified, machine-actionable, but flexible formalism for representation
03/01/18
4
DBpedia -motivation and use cases
13 http://dbpedia.orgDBpediaTutorial09.02.2015
an RDF view of structured Wikipedia information enables:
● sophisitated queries⇒ cross-referencing facts of entities⇒ filtering of entities based on their types
and fact assertions
● combining facts from Wikipedia with machine-actionable knowledge from other structured datasets (Geodata, Yellowpages, WordNet, ...)
Another take onQuestion Answering
„Which films starred John Cleese without any other members of Monty Python?“
14 http://dbpedia.orgDBpediaTutorial09.02.2015
DBpedia Tutorial 09 .02 .2015 http://dbpedia.org21
DBpedia -contents and datasets
● Wikipedia article ⇔DBpedia resourcehttp://en.wikipedia.org/wiki/Monty_Python⇔http://dbpedia.org/resource/Monty_Python
● mapping-based types and facts governed by the DBpedia Ontology
16 http://dbpedia.orgDBpediaTutorial09.02.2015
03/01/18
5
DBpedia -contents and datasets
17 http://dbpedia.orgDBpediaTutorial09.02.2015
● 4.58 mio. entities and 583 mio. triples (Englisch DBpedia 2014)131,2 mio. fact assertions (devived from info boxes) 168,5 mio. triples representing Wikipedia structure57,1 mio. links to external datasets
● DBpedia resources are categorised in several manners:● by Wikipedia categories (represented in SKOS)● by YAGO classification● by links to WordNet Synsets● by assignment of classes from the DBpedia ontology
● Provenance meta-data⇒ From which part of which Wikipedia page was a triple derived?
Mappings Wiki
18 http://dbpedia.orgDBpediaTutorial09.02.2015
a community effort to:–develop an ontology schema–provide mappings from Wikipedia Infoboxes
properties to this ontology
→ creating an alignment between Wikipedia and Dbpedia→ eliminating name variations in properties and classes→ big boost for Precision
DBpedia Ontology
19 http://dbpedia.orgDBpediaTutorial09.02.2015
cross-domain ontology– maintained and extended by the community in the
DBpedia Mappings Wiki– manually created based on the most commonly used
infoboxes– currently covers 685 classes which form a subsumption
hierarchy and are described by 2,795 different properties
– subsumption hierarchy with a maximal depth of 5– is maintained and extended by the community in the
DBpedia Mappings Wiki
Dbpedia Ontology Extract
20 http://dbpedia.orgDBpedia Tutorial 09 .02 .2015
03/01/18
6
Wikipedia articles
– Wikipedia articles consist mostly of free text– also comprise various types of structured
information– including: infobox templates, categorisation
information, images, geo-coordinates, links to external web pages, disambiguation pages, redirects between pages, other language links– Title– Abstract– Infoboxes– Geo-
21 http://dbpedia.orgDBpediaTutorial09.02.2015
coordinates– Categories– Images
article outline–Links
» other language versions
» other Wikipedia pages» To the Web»Redirects»Disambiguations
DBpedia Tutorial 09 .02 .2015 http://dbpedia.org28
Structure in Wikipedia
Title Abstract InfoboxesGeo-coordinates Categories ImagesLinks
– other language versions– other Wikipedia pages– To the Web– Redirects– Disambiguations
29
{{Infobox Korean settlement= Busan Metropolitan City= Busan.jpg= A view of the [[Geumjeong]] district in Busan
http://dbpedia.org
= 부산 광역시
= 763.46= 3635389= 2006= Hur Nam-sik= 15 wards (Gu), 1 county (Gun)= [[Yeongnam]]= [[Gyeongsang]]
| title| img| imgcaption| hangul...| area_km2| pop| popyear| mayor| divs| region| dialect}}
″부산 광역시″@Hangdbp:Busandbp:Busan dbp:Busan
dbp:titledbp:hangul dbp:area_km2
″Busan Metropolitan City″
″763.46“^xsd:floatdbp:Busan dbp:pop ″3635389“^xsd:intdbp:Busan dbp:region dbp:Yeongnamdbp:Busan dbp:dialect dbp:Gyeongsang...
infobox encondig
DBpediaTutorial09.02.2015
http://dbpedia.org30
heterogeneiety in infoboxes
DBpedia Tutorial 09 .02 .2015
03/01/18
7
Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík
Brown (Prime Minister)office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan
Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York
25 http://dbpedia.orgDBpediaTutorial09.02.2015
DBpedia ExtractionFramework
26 http://dbpedia.orgDBpediaTutorial09.02.2015
DIEF - DBpedia Information Extraction Framework– extracts structured information from Wikipedia and
turns it into a rich knowledge base– Mapping-Based Infobox Extraction, Raw Infobox
Extraction, Feature Extraction, Statistical Extraction– Hosted on GitHub– Written in Scala & Java
27 http://dbpedia.orgDBpediaTutorial09.02.2015
Accessing DBpedia - Browsing
● official DBpedia mirror http://dbpedia.org⇒ run on Virtuoso⇒ faceted search withVirtuoso Facets
28 http://dbpedia.orgDBpediaTutorial09.02.2015
03/01/18
8
Acessing DBpedia - SPARQL
● official SPARQL endpoint http://dbpedia.org/sparql● ⇒ subject to a fair use policy (limited query runtime)● ⇒ query with any SPARQL compliant tool or API
29 http://dbpedia.orgDBpediaTutorial09.02.2015
Querying RDF with SPARQL
● SPARQL Protocol and RDF Query Language⇒ graph patterns as set of triples (with variables)⇒ successful matches of graph patters generate
bindings in (sub-)query solutions
30 http://dbpedia.orgDBpediaTutorial09.02.2015
Querying RDF with SPARQL
31 http://dbpedia.orgDBpediaTutorial09.02.2015
● SPARQL Protocol and RDF Query Language⇒ graph patterns as set of triples (with variables)⇒ successful matches of graph patters generate
bindings in (sub-)query solutions● different result types for queriesSELECT ⇒ bindings, ASK ⇒ true/false, CONSTRUCT ⇒ new graph
● combinators and modifiers for basic graph patterns⇒ UNION, FILTER, MINUS, FILTER (NOT) EXISTS
● result set modifiesLIMIT, OFFSET, DISTINCT, ORDER BY
● numerous operators and operators for resource and literal values
● many additions in 1.1 revision:grouping & aggregates, regular property path expr., sub-queries
SPARQL query Examples
SELECT?name ?birth ?death ?personWHERE{?person foaf:name ?name .?person dbo:deathDate ?death .?person dbo:birthPlace :Berlin .?person dbo:birthDate ?birth .FILTER(?birth <"1900-01-01"^^xsd:date).}ORDERBY?name
PeoplewhowereborninBerlinbefore1900
03/01/18
9
SPARQL Query Example
33 http://dbpedia.orgDBpedia Tutorial 09 .02 .2015
SPARQL Tooling
34 http://dbpedia.orgDBpediaTutorial09.02.2015
● FlintSparqlEditor: Javascript SPARQL Editor● syntax highlighting, code assistance● auto-completion for properties and classes (for small datasets)
● Protegé: full-fledged ontology editor● good to get an overview of ontologies backing datasets● two SPARQL plug-ins (one supporting entailment)
● curl or your favourite simple REST API● allows for simple testing queries from any text editor with SPARQL syntax support (e.g. Emacs, Vim, Sublime Text)
$curl -H 'Accept: application/json' --data-urlencode "query=$(cat query.sparql)" http://dbpedia.org/sparql
Further Reading: Website
35 http://dbpedia.orgDBpediaTutorial09.02.2015
landing page: http://dbpedia.org/About
overview over datasets (also info on localized datasets):http://wiki.dbpedia.org/Datasets
DBpeda data access overview: http://wiki.dbpedia.org/OnlineAccess
Further Reading: Browsing
36 http://dbpedia.orgDBpediaTutorial09.02.2015
DBpedia VAD: http://dbpedia.org/page/DBpedia
DBpedia Facets: http://dbpedia.org/fct/
new DBpedia frontend:http://de.dbpedia.org/page/DBpedia (get an impression to the German DBpedia version) https://github.com/lukovnikov/ldviewer (source code)
Context platform: http://context.aksw.org/app/hub.php?corpus=6&action=facets (online demo to browse LOD2 Blog) http://context.aksw.org/app/ (project home)