Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George,...

20
Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan , Gjergji Kasneci DESWeb 03/31/2014

Transcript of Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George,...

Page 1: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci

DESWeb 03/31/2014

Page 2: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

2

Structured Data

■ Advantages of structured data over unstructured data:

□ Search for explicit facts

□ Summarization of possibly interesting information

□ Automated knowledge discovery

■ Google Knowledge Graph

■ RDF Knowledge bases

□ DBpedia, YAGO/NAGA

A handful of salient facts about the query entity.

Page 3: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

3■ Asking for classes to which Albert Einstein belongs

Querying YAGO

Page 4: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

4■ Asking for classes to which Albert Einstein belongs

Querying DBpedia

predicate object

rdf:type owl:Thing

rdf:type dbpedia:Agent

rdf:type dbpedia:Person

rdf:type dbpedia:Scientist

rdf:type umbel:Scientist

rdf:type schema:Person

rdf:type yago:Astronomer109818343

rdf:type foaf:Person

rdf:type 19th-centuryAmericanPeople

rdf:type 19th-centuryGermanPeople

Page 5: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

5

Challenge

select distinct ?p, ?o where

{ dbpedia:Barack_Obama ?p ?o}

p c

rdf:type owl:Thing

rdf:type dbpedia:Person

rdf:type yago:Person100007846

... ...

rdf:type dbpedia:Politician

... ...

dbpedia:spouse dbpedia:Michelle_Obama

Web Documents

p c

owl:orderInOffice President of the United States

dbpedia:type dbpedia:Politician

dbpedia:spouse dbpedia:Michelle_Obama

owl:birthPlace dbpedia:Honolulu

dbpprop:residence dbpedia:White_House

.... .....

rdf:type owl:Thing

Page 6: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

6

Challenges

Big DataDBpedia 3.8,

ClueWeb corpus

ArchitectureText extraction, score

computation/ranking, query processing

EvaluationConduction of user studies

Ranking StrategiesImrove the ranking results

Page 7: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

Overview

7Languages

• Python

• Java

• SPARQL

• JavaScript

Frameworks:

• Django

• Lucene

Web application (Django)

DBpedia Endpoint

(Apache Jena)

Application Data(Postgres)

Web corpus(Lucene Index)

User StudiesQuerying

Ranking strategiesRanking strategies

Intra DBpedia

strategies

Web Corpus

strategies

7

Page 8: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

8

Ranking Facts

■ Query types:

□ Subject queries - return all physicists

□ Property queries - return all facts related to Einstein

■ Ranking strategies

□ Ranking by frequency and document frequency

□ Ranking by information diversity

□ Random walk

□ Web-based co-occurrence statistics

SELECT ?p ?o { Albert_Einstein ?p ?o }

SELECT ?s { ?s type Physicist }

Page 9: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

9

Ranking by frequency and document frequency

<Albert_Einstein>

<topic> <Nobel_laureates>;<topic> <Theoretical_physicists>;<topic> <German_physicists>;<topic> <American_inventors>;<type> <Scientist>;<type> <Person>;<type> <Thing>;<residence> "Switzerland";<residence> "Austria-Hungary";<residence> "German Empire";<spouse> "Mileva Maric";...

subject document of „Albert Einstein“

<Newton> <topic> <Theoretical_physicists>.<Newton> <topic> <Nobel_laureates>.<Newton> <topic> <Mathematicians>.<Newton> <topic> <Optical_physicists>.<Newton> <topic> <History_of_calculus>.<Newton> <topic> <English_alchemists>.

<Einstein> <topic> <Theoretical_physicists>.

<Einstein> <topic> <Nobel_laureates>.

<Einstein> <topic> <German_physicists>.

<Einstein> <topic> <American_inventors>.

predicate document of „topic“

<Isaac_Newton> <topic> <Theoretical_physicists>.

<Albert_Einstein> <topic> <Theoretical_physicists>.<Bruno_Coppi> <topic> <Theoretical_physicists>.<Ravi_Gomatam> <topic> <Theoretical_physicists>.

...

object document of „Theoretical physicists“

[Shady et al ESWC’11]

Page 10: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

10

Ranking by frequency and document frequency

■ Subject queries:

□ Global relevance

Isaac NewtonacademicAdvisor ...;birthDate ...;birthPlace ...;comment ...;ethnicity ...;field ...;influenced ...;influencedBy ...;knownFor ...;label ...;notableStudent ...;subject ...;subject ...;type ...;

Ravi Gomatam

subject ...;subject ...;subject ...;subject ...;subject ...;

Page 11: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

11

Limitations for Property Queries

■ Property queries:

□ Global relevant but distinctive to the given subject– type Person vs. type Scientist

Page 12: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

12

Ranking by diversity

■ Following a probabilistic model

□ Property queries:– Properties and objects that are as discriminative as

possible

□ Subject queries:

Page 13: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

13

Random Walk Model

■ Consider the knowledge base as a directed graph

□ Already applied in [Kasneci CIKM’09]

□ Problem: literals have no outgoing link

■ Use Wiki Pagelinks and Infobox Property Mappings

□ Entities with high indegree, such as countries, are favored– Good for subject queries– Bad for property queries

Page 14: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

14

Web Documents

Co-occurrence statistics

■ Lemur Project Clueweb09 Category-B web corpus

□ 50 million web documents (1.5 TB)

□ Only English-language documents

□ Includes approx. 2.7 million Wikipedia articles

■ Create an inverted index

■ Consider different word distance limits as documents

■ Rank subject-object pairs

□ „Albert Einstein“ and „Physicist“

□ Store only pairwise co-occurrence:

□ Compute frequency of s:

Page 15: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

15

Evaluation

■ User study 1

□ 8 queries

□ all results

□ 12 users

□ 19 approaches/ configurations

■ 1-4: irrelevant- highly relevant

■ User study 2

□ 8+20 queries

□ top-10 results of best 4 approaches side-by-side 10 users

□ Best 3 approaches from user study 1

Page 16: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

16

Top 4 Approaches in User study 1

Page 17: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

17

User study 2

Page 18: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

18

Results Example:Theoretical Physicists

Subject

Albert Einstein

Isaac Newton

Galileo Galilei

James Clerk Maxwell

Richard Feynman

Stephen Hawking

Max Planck

Enrico Fermi

Werner Heisenberg

Pierre-Simon Laplace

DBpedia Random Walk Model

Page 19: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

19

Results Example: Albert Einstein

DBpedia Co-occurrence statistics

predicate object

rdf:type owl:Thing

rdf:type dbpedia:Agent

rdf:type dbpedia:Person

rdf:type dbpedia:Scientist

rdf:type umbel:Scientist

rdf:type schema:Person

rdf:type yago:Astronomer109818343

rdf:type foaf:Person

rdf:type 19th-centuryAmericanPeople

rdf:type 19th-centuryGermanPeople

predicate object

fields Physics

field Physics

deathPlace United States

placeOfDeath United States

shortDescription Physicists

description Physicist

type Scientist

ethnicity Jewish

subject Einstein family

residence Switzerland

Page 20: Assigning Global Relevance Scores to DBpedia Facts Philipp Langer, Patrick Schulze, Stefan George, Tobias Metzke, Ziawasch Abedjan, Gjergji Kasneci DESWeb.

Assigning Global Relevance Scores to DBpedia Facts

20

Conclusions

■ Investigated multiple approaches to rank DBpedia facts

□ Information theory, statistical reasoning, random walk, and co-occurrence statistics in web documents

■ DBpedia Knowledge base already provides enough information to improve the ranking of results

■ Improvement of property queries through web-based co-occurrence statistics

■ We provide the annotated datasets at

□ https://www.hpi.uni-potsdam.de/naumann/sites/dbpedia/