Managing Completeness of Web Data

43
Managing Completeness of Web Data Fariz Darari PhD Supervisor: Werner Nutt Supported by the project MAGIC, funded by the province of Bolzano Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 1 / 38

Transcript of Managing Completeness of Web Data

Page 1: Managing Completeness of Web Data

Managing Completeness of Web Data

Fariz DarariPhD Supervisor: Werner Nutt

Supported by the project MAGIC, funded by the province of Bolzano

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 1 / 38

Page 2: Managing Completeness of Web Data

About Us

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 2 / 38

Page 3: Managing Completeness of Web Data

Research Group

Sorted by distance to Werner’s office :)

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 3 / 38

Page 4: Managing Completeness of Web Data

Bozen-Bolzano

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 4 / 38

Page 5: Managing Completeness of Web Data

Motivation

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 5 / 38

Page 6: Managing Completeness of Web Data

Completeness statements are already there

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 6 / 38

Page 7: Managing Completeness of Web Data

However . . .

Completeness statements are availablebut only in natural languageUnclear what data completeness & query completeness meanNo techniques to check whether data completeness entailsquery completeness

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 7 / 38

Page 8: Managing Completeness of Web Data

Solution Ideas

Completeness statements are availablebut only in natural language

Solution: RDF-ize completeness statements

Unclear what data completeness & query completeness meanSolution: Formalize data completeness & query completeness

No techniques to check whether data completeness entailsquery completeness

Solution: Develop techniques to check whether data completenessentails query completeness

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38

Page 9: Managing Completeness of Web Data

Solution Ideas

Completeness statements are availablebut only in natural language

Solution: RDF-ize completeness statementsUnclear what data completeness & query completeness mean

Solution: Formalize data completeness & query completeness

No techniques to check whether data completeness entailsquery completeness

Solution: Develop techniques to check whether data completenessentails query completeness

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38

Page 10: Managing Completeness of Web Data

Solution Ideas

Completeness statements are availablebut only in natural language

Solution: RDF-ize completeness statementsUnclear what data completeness & query completeness mean

Solution: Formalize data completeness & query completeness

No techniques to check whether data completeness entailsquery completeness

Solution: Develop techniques to check whether data completenessentails query completeness

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38

Page 11: Managing Completeness of Web Data

Solutions

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 9 / 38

Page 12: Managing Completeness of Web Data

Background: RDF

Grd = { (resDogs,dir , tarantino),(resDogs,act , tarantino) }

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 10 / 38

Page 13: Managing Completeness of Web Data

Background: SPARQL

SELECTQsdir = ({ ?m }, { (?m,dir , tarantino) })

ASKQadir = ({ }, { (?m,dir , tarantino) })

CONSTRUCT

Qcdir = ({ (?m,dir , tarantino) }, { (?m,dir , tarantino) })

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 11 / 38

Page 14: Managing Completeness of Web Data

Story: Incomplete Data Source

An incomplete data source of Reservoir Dogs,Gdbp = (Ga

dbp,Gidbp):

Gadbp = {(resDogs,dir , tarantino)}

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 12 / 38

Page 15: Managing Completeness of Web Data

Story: Incomplete Data Source

An incomplete data source of Reservoir Dogs,Gdbp = (Ga

dbp,Gidbp):

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 13 / 38

Page 16: Managing Completeness of Web Data

Story: Completeness Statement

Gadbp = {(resDogs,dir , tarantino)}

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

From (Gadbp,G

idbp), we can say that DBpedia is complete

for movies directed by Tarantino:

Cdir = Compl((?m,dir , tarantino) | ∅)

However, it is not complete for actors in movies directed by Tarantino:

Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38

Page 17: Managing Completeness of Web Data

Story: Completeness Statement

Gadbp = {(resDogs,dir , tarantino)}

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

From (Gadbp,G

idbp), we can say that DBpedia is complete

for movies directed by Tarantino:

Cdir = Compl((?m,dir , tarantino) | ∅)

However, it is not complete for actors in movies directed by Tarantino:

Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38

Page 18: Managing Completeness of Web Data

Story: Query Completeness

Gadbp = {(resDogs,dir , tarantino)}

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

Consequently, when we ask for all movies directed by Tarantinoover DBpedia:

Qdir = ({?m}, {(?m,dir , tarantino)})

the query completeness Compl(Qdir ) is obtained.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 15 / 38

Page 19: Managing Completeness of Web Data

Story: Query Completeness

Gadbp = {(resDogs,dir , tarantino)}

Gidbp = {(resDogs,dir , tarantino), (resDogs,act , tarantino)}

However, if we ask for all movies directed by and starring Tarantino:

Qdir+act = ({?m}, {(?m,dir , tarantino), (?m,act , tarantino)})

the query completeness Compl(Qdir+act) is not obtained.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 16 / 38

Page 20: Managing Completeness of Web Data

Incomplete Data Source

Definition (Incomplete Data Source)An incomplete data source is a pair of two graphs

G = (Ga,Gi), where Ga ⊆ Gi .

We call Ga the available graph and Gi the ideal graph.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 17 / 38

Page 21: Managing Completeness of Web Data

Completeness Statement

Definition (Completeness Statement)Let P1 be a non-empty BGP and P2 a BGP.

A completeness statement is defined as

Compl(P1 | P2)

where we call P1 the pattern and P2 the condition of the statement.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 18 / 38

Page 22: Managing Completeness of Web Data

Satisfaction of Completeness Statements

To a statementC = Compl(P1 | P2),

we associate the CONSTRUCT query

QC = (P1,P1 ∪ P2).

Then, we say:

C is satisfied by an incomplete data source G = (Ga,Gi),written G |= C, if

JQCKGi ⊆ Ga.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 19 / 38

Page 23: Managing Completeness of Web Data

Completeness Statements in RDF

Cact = Compl((?m,act , ?a) | (?m,dir , tarantino))

lv:dataset a void:Dataset;c:hasComplStmt lv:csAct.

lv:csAct c:hasPattern [c:subject [c:varName "m"];c:predicate s:actor;c:object [c:varName "a"]];

c:hasCondition [c:subject [c:varName "m"];c:predicate s:director;c:object lmdb:Quentin_Tarantino].

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 20 / 38

Page 24: Managing Completeness of Web Data

Query Completeness

Definition (Query Completeness)Let Q be a query. We write

Compl(Q)

to say that Q is complete.

An incomplete data source G = (Ga,Gi) satisfies Compl(Q),written G |= Compl(Q), if

JQKGi = JQKGa .

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 21 / 38

Page 25: Managing Completeness of Web Data

Completeness Entailment

Problem Definition (Completeness Entailment)Let C be a set of completeness statements and Q a query.

We say that C entails the completeness of Q, written

C |= Compl(Q),

if any incomplete data source satisfying C also satisfies Compl(Q).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 22 / 38

Page 26: Managing Completeness of Web Data

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act) where

Pdir+act = { (?m,dir , tarantino), (?m,act , tarantino) }.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 23 / 38

Page 27: Managing Completeness of Web Data

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 24 / 38

Page 28: Managing Completeness of Web Data

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Therefore,

JQCdir KPdir+act∪ JQCact KPdir+act

=

{ (m,dir , tarantino), (m,act , tarantino) } =Pdir+act .

Thus,Cdir ,act |= Compl(Qdir+act).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38

Page 29: Managing Completeness of Web Data

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Therefore,

JQCdir KPdir+act∪ JQCact KPdir+act

=

{ (m,dir , tarantino), (m,act , tarantino) } =

Pdir+act .

Thus,Cdir ,act |= Compl(Qdir+act).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38

Page 30: Managing Completeness of Web Data

Intuition: Completeness Entailment

Consider the set Cdir ,act = {Cdir ,Cact } of completeness statementsand the query Qdir+act = ({ ?m },Pdir+act).

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Therefore,

JQCdir KPdir+act∪ JQCact KPdir+act

=

{ (m,dir , tarantino), (m,act , tarantino) } =Pdir+act .

Thus,Cdir ,act |= Compl(Qdir+act).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38

Page 31: Managing Completeness of Web Data

Prototypical Graph

Pdir+act = { (m,dir , tarantino), (m,act , tarantino) }

Definition (Prototypical Graph)Let Q = (W ,P) be a query.

The freeze mapping id is defined as a mappingfrom each variable ?v in P to a new IRI v .

Instantiating the graph pattern P with id yields the graph

P := id P,

which we call the prototypical graph of Q.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 26 / 38

Page 32: Managing Completeness of Web Data

Transfer Operator

JQCdir KPdir+act∪ JQCact KPdir+act

Definition (Transfer Operator)For any set C of completeness statements and a graph G,we define the transfer operator TC that computes the unionof the evaluation over G of all CONSTRUCT queriesof the statements in C:

TC(G) =⋃

C∈C

JQCKG

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 27 / 38

Page 33: Managing Completeness of Web Data

Completeness Entailment Theorem

Pdir+act = TCdir,act (Pdir+act)

Theorem (Completeness of Basic Queries)Let C be a set of completeness statements andQ = (W ,P) a basic query. Then,

C |= Compl(Q) if and only if P = TC(P).

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 28 / 38

Page 34: Managing Completeness of Web Data

Query Class: DISTINCT Queries

Give us all Oscar-winning things:

Qawd = (Wawd ,Pawd)d =

({?m}, { (?m,award ,oscar), (?m,award , ?aw) })d

Complete for all Oscar-winning things:

Cos = Compl((?m,award ,oscar) | ∅)

{Cos } |= Compl(Qawd) holds?

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 29 / 38

Page 35: Managing Completeness of Web Data

Query Class: OPT Queries

Give us all movies, and their awards, if any:

Qmaw = ({ ?m, ?aw }, ((?m,a,Movie) OPT (?m,award , ?aw)))

Complete for all movies and their awards:

Caw = Compl((?m,a,Movie), (?m,award , ?aw) | ∅)

{Caw } |= Compl(Qmaw ) holds?

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 30 / 38

Page 36: Managing Completeness of Web Data

Query Class: Queries under RDFS Semantics

Give us all films:

Qfilm = ({ ?m }, { (?m,a,Film) })

Complete for all movies:

Cmovie = Compl((?m,a,Movie) | ∅)

Films are the same as movies:

Sfm = {(Film, subclass,Movie), (Movie, subclass,Film)}

{Cmovie } |= Compl(Qfilm) wrt. Sfm holds?

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 31 / 38

Page 37: Managing Completeness of Web Data

Federated Completeness Statements

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 32 / 38

Page 38: Managing Completeness of Web Data

Timestamped Completeness Statements

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 33 / 38

Page 39: Managing Completeness of Web Data

Conclusions

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 34 / 38

Page 40: Managing Completeness of Web Data

Conclusions

Completeness statements can now be represented in RDFWe know how completeness statements can entail querycompleteness in different query classes anddifferent settings of completeness statements

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 35 / 38

Page 41: Managing Completeness of Web Data

Future Work

Completeness statements for queries with negationCompleteness statements as session annotationsfor RDF streamsStatistical completeness reasoning

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 36 / 38

Page 42: Managing Completeness of Web Data

Publications

Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski: CompletenessStatements about RDF Data Sources and Their Use for Query Answering.ISWC 2013.

Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A CompletenessReasoner for SPARQL Queries Over RDF Data Sources. ESWC Posters andDemos 2014.

Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gapbetween RDF and SPARQL using Completeness Statements. ISWC Postersand Demos 2014.

Fariz Darari, Radityo Eko Prasojo, Werner Nutt: Expressing No-ValueInformation in RDF. ISWC Posters & Demos 2015.

The latest results (timestamped statements and efficient completenessreasoning with 1 million statements) have been submitted to a journal.

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 37 / 38

Page 43: Managing Completeness of Web Data

Compl((myDaSePresentation, slide, ?s) | ∅)

Thank You!

Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 38 / 38