Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George,...
-
Upload
robyn-hudson -
Category
Documents
-
view
224 -
download
2
Transcript of Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George,...
Assigning Global Relevance Scores to DBpedia Facts
Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci
DESWeb 03/31/2014
Assigning Global Relevance Scores to DBpedia Facts
2
Structured Data
■ Advantages of structured data over unstructured data:
□ Search for explicit facts
□ Summarization of possibly interesting information
□ Automated knowledge discovery
■ Google Knowledge Graph
■ RDF Knowledge bases
□ DBpedia, YAGO/NAGA
A handful of salient facts about the query entity.
Assigning Global Relevance Scores to DBpedia Facts
3■ Asking for classes to which Albert Einstein belongs
Querying YAGO
Assigning Global Relevance Scores to DBpedia Facts
4■ Asking for classes to which Albert Einstein belongs
Querying DBpedia
predicate object
rdf:type owl:Thing
rdf:type dbpedia:Agent
rdf:type dbpedia:Person
rdf:type dbpedia:Scientist
rdf:type umbel:Scientist
rdf:type schema:Person
rdf:type yago:Astronomer109818343
rdf:type foaf:Person
rdf:type 19th-centuryAmericanPeople
rdf:type 19th-centuryGermanPeople
Assigning Global Relevance Scores to DBpedia Facts
5
Challenge
select distinct ?p, ?o where
{ dbpedia:Barack_Obama ?p ?o}
p c
rdf:type owl:Thing
rdf:type dbpedia:Person
rdf:type yago:Person100007846
... ...
rdf:type dbpedia:Politician
... ...
dbpedia:spouse dbpedia:Michelle_Obama
Web Documents
p c
owl:orderInOffice President of the United States
dbpedia:type dbpedia:Politician
dbpedia:spouse dbpedia:Michelle_Obama
owl:birthPlace dbpedia:Honolulu
dbpprop:residence dbpedia:White_House
.... .....
rdf:type owl:Thing
Assigning Global Relevance Scores to DBpedia Facts
6
Challenges
Big DataDBpedia 3.8,
ClueWeb corpus
ArchitectureText extraction, score
computation/ranking, query processing
EvaluationConduction of user studies
Ranking StrategiesImrove the ranking results
Assigning Global Relevance Scores to DBpedia Facts
Overview
7Languages
• Python
• Java
• SPARQL
• JavaScript
Frameworks:
• Django
• Lucene
Web application (Django)
DBpedia Endpoint
(Apache Jena)
Application Data(Postgres)
Web corpus(Lucene Index)
User StudiesQuerying
Ranking strategiesRanking strategies
Intra DBpedia
strategies
Web Corpus
strategies
7
Assigning Global Relevance Scores to DBpedia Facts
8
Ranking Facts
■ Query types:
□ Subject queries - return all physicists
□ Property queries - return all facts related to Einstein
■ Ranking strategies
□ Ranking by frequency and document frequency
□ Ranking by information diversity
□ Random walk
□ Web-based co-occurrence statistics
SELECT ?p ?o { Albert_Einstein ?p ?o }
SELECT ?s { ?s type Physicist }
Assigning Global Relevance Scores to DBpedia Facts
9
Ranking by frequency and document frequency
<Albert_Einstein>
<topic> <Nobel_laureates>;<topic> <Theoretical_physicists>;<topic> <German_physicists>;<topic> <American_inventors>;<type> <Scientist>;<type> <Person>;<type> <Thing>;<residence> "Switzerland";<residence> "Austria-Hungary";<residence> "German Empire";<spouse> "Mileva Maric";...
subject document of „Albert Einstein“
<Newton> <topic> <Theoretical_physicists>.<Newton> <topic> <Nobel_laureates>.<Newton> <topic> <Mathematicians>.<Newton> <topic> <Optical_physicists>.<Newton> <topic> <History_of_calculus>.<Newton> <topic> <English_alchemists>.
<Einstein> <topic> <Theoretical_physicists>.
<Einstein> <topic> <Nobel_laureates>.
<Einstein> <topic> <German_physicists>.
<Einstein> <topic> <American_inventors>.
predicate document of „topic“
<Isaac_Newton> <topic> <Theoretical_physicists>.
<Albert_Einstein> <topic> <Theoretical_physicists>.<Bruno_Coppi> <topic> <Theoretical_physicists>.<Ravi_Gomatam> <topic> <Theoretical_physicists>.
...
object document of „Theoretical physicists“
[Shady et al ESWC’11]
Assigning Global Relevance Scores to DBpedia Facts
10
Ranking by frequency and document frequency
■ Subject queries:
□ Global relevance
Isaac NewtonacademicAdvisor ...;birthDate ...;birthPlace ...;comment ...;ethnicity ...;field ...;influenced ...;influencedBy ...;knownFor ...;label ...;notableStudent ...;subject ...;subject ...;type ...;
Ravi Gomatam
subject ...;subject ...;subject ...;subject ...;subject ...;
Assigning Global Relevance Scores to DBpedia Facts
11
Limitations for Property Queries
■ Property queries:
□ Global relevant but distinctive to the given subject– type Person vs. type Scientist
Assigning Global Relevance Scores to DBpedia Facts
12
Ranking by diversity
■ Following a probabilistic model
□ Property queries:– Properties and objects that are as discriminative as
possible
□ Subject queries:
Assigning Global Relevance Scores to DBpedia Facts
13
Random Walk Model
■ Consider the knowledge base as a directed graph
□ Already applied in [Kasneci CIKM’09]
□ Problem: literals have no outgoing link
■ Use Wiki Pagelinks and Infobox Property Mappings
□ Entities with high indegree, such as countries, are favored– Good for subject queries– Bad for property queries
Assigning Global Relevance Scores to DBpedia Facts
14
Web Documents
Co-occurrence statistics
■ Lemur Project Clueweb09 Category-B web corpus
□ 50 million web documents (1.5 TB)
□ Only English-language documents
□ Includes approx. 2.7 million Wikipedia articles
■ Create an inverted index
■ Consider different word distance limits as documents
■ Rank subject-object pairs
□ „Albert Einstein“ and „Physicist“
□ Store only pairwise co-occurrence:
□ Compute frequency of s:
Assigning Global Relevance Scores to DBpedia Facts
15
Evaluation
■ User study 1
□ 8 queries
□ all results
□ 12 users
□ 19 approaches/ configurations
■ 1-4: irrelevant- highly relevant
■ User study 2
□ 8+20 queries
□ top-10 results of best 4 approaches side-by-side 10 users
□ Best 3 approaches from user study 1
Assigning Global Relevance Scores to DBpedia Facts
16
Top 4 Approaches in User study 1
Assigning Global Relevance Scores to DBpedia Facts
17
User study 2
Assigning Global Relevance Scores to DBpedia Facts
18
Results Example:Theoretical Physicists
Subject
Albert Einstein
Isaac Newton
Galileo Galilei
James Clerk Maxwell
Richard Feynman
Stephen Hawking
Max Planck
Enrico Fermi
Werner Heisenberg
Pierre-Simon Laplace
DBpedia Random Walk Model
Assigning Global Relevance Scores to DBpedia Facts
19
Results Example: Albert Einstein
DBpedia Co-occurrence statistics
predicate object
rdf:type owl:Thing
rdf:type dbpedia:Agent
rdf:type dbpedia:Person
rdf:type dbpedia:Scientist
rdf:type umbel:Scientist
rdf:type schema:Person
rdf:type yago:Astronomer109818343
rdf:type foaf:Person
rdf:type 19th-centuryAmericanPeople
rdf:type 19th-centuryGermanPeople
predicate object
fields Physics
field Physics
deathPlace United States
placeOfDeath United States
shortDescription Physicists
description Physicist
type Scientist
ethnicity Jewish
subject Einstein family
residence Switzerland
Assigning Global Relevance Scores to DBpedia Facts
20
Conclusions
■ Investigated multiple approaches to rank DBpedia facts
□ Information theory, statistical reasoning, random walk, and co-occurrence statistics in web documents
■ DBpedia Knowledge base already provides enough information to improve the ranking of results
■ Improvement of property queries through web-based co-occurrence statistics
■ We provide the annotated datasets at
□ https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/