Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences

OutlineMotivation

PreliminariesOur model

Query Answering in Probabilistic Datalog+/–Ontologies under Group Preferences

Thomas Lukasiewicz, Maria Vanina Martinez,Gerardo I. Simari, and Oana Tifrea-Marciuska

Department of Computer Science, University of Oxford, UK

November 18, 2013

Oana Tifrea-Marciuska Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences slide 1 /29

OutlineMotivation


Motivation

PreliminariesDatalog+/–The Chase

Our modelComponentsStrategies to Answer k-rank Disjunctive Atomic Queries


OutlineMotivation


MotivationI The Web has been shifting towards the Social Semantic Web

I need model group of users that can handleI qualitative preferences of users

I disagreement between users

I efficiency

SUM2013

I Uncertainty is present on the Web (e.g., information integration)I need to model uncertainty

Desire: ontology language that handles preferences of a group ofusers and can handle uncertainty

1

1image source: www.boundless.comOana Tifrea-Marciuska Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences slide 3 /29

OutlineMotivation


Datalog+/–The Chase

Datalog+/– (1/3)I A database (instance) D for R (a set of atoms)

D = {farm(f1), farm(f2), capital(c1), capital(c2), capital(c3), beach(b1),hasAct(f1, act1), hasAct(f2, act1), hasAct(c1, act2), hasAct(c2, act3),hasAct(c3, act1), hasAct(c3, act4), hasAct(c3, act5)}.

I Tuple-generating dependency (TGD): constraint of the form∀X∀Y Φ(X,Y)→ ∃ZΨ(X,Z) (Φ(X,Y) and Ψ(X, Z) areconjunctions of atoms over R, the body and the head)

dest(X )→ ∃YhasAct(X ,Y )

A conjunctive query (CQ) over R has the form Q(X) = ∃Y Φ(X,Y),where Φ(X,Y) is a conjunction of atoms

Q(X ) = farm(X ) ∧ hasAct(X , act1).

A Boolean CQ (BCQ) over R is a CQ of the form Q(), often writtenas the set of all its atoms, without quantifiers.

Q() = ∃Xfarm(X ) ∧ hasAct(X , act1).


OutlineMotivation



Datalog+/– (2/3)

I Answers to CQs and BCQs are defined via homomorphisms, whichare mappings µ : ∆ ∪∆N ∪ V → ∆ ∪∆N ∪ V

I The set of all answers Q(D) is the set of all tuples t over a set ofdata constants s.t. ∃µ µ : X∪Y→∆∪∆N s.t. µ(Φ(X,Y))⊆D andµ(X) = t.

D = {farm(f1), farm(f2), capital(c1), capital(c2), capital(c3), beach(b1),hasAct(f1, act1), hasAct(f2, act1), hasAct(c1, act2), hasAct(c2, act3),hasAct(c3, act1), hasAct(c3, act4), hasAct(c3, act5)}.

For Q(X ) = farm(X ) ∧ hasAct(X , act1)

the set of all answers over D is Q(D) = {f1}.For Q() = ∃Xfarm(X ) ∧ hasAct(X , act1)

the answer is YES.


OutlineMotivation



Datalog+/– (3/3)

I For a database D for R, and a set of TGDs Σ on R, the set ofmodels of D and Σ, denoted mods(D,Σ), is the set of all (possiblyinfinite) databases B such that

I D ⊆B andI every σ ∈Σ is satisfied in B.

I The set of answers for a CQ Q to D and Σ, denoted ans(Q,D,Σ),is the set of all tuples a such that a ∈ Q(B) for all B ∈mods(D,Σ).

I A TGD σ is guarded iff it contains an atom in its body that containsall universally quantified variables of σ.

σ1 : P(X ) ∧ R(X ,Y ) ∧ Q(Y )→ ∃R(Y ,Z) YES.σ2 : R(X ,Y ) ∧ R(Y ,Z)→ R(X ,Z) NO.If Σ consists of guarded TGDs, CQs can be evaluated on a fragmentof constant depth k ∗ |Q|, PTIME in data complexity.


OutlineMotivation



The Chase

D ∪Σ |= Q iff chase(D,Σ) |= Q.Example. Let O = (D,Σ) be an ontology describing travel activities:

Σ = {farm(X )→ dest(X ), capital(A)→ dest(A),beach(X )→ dest(X ), dest(X )→ ∃YhasAct(X ,Y )};

D = {farm(f1), farm(f2), capital(c1), capital(c2), capital(c3),beach(b1) , hasAct(f1, act1) , hasAct(f2, act1) ,hasAct(c1, act2) , hasAct(c2, act4) , hasAct(c3, act5)}

chase(D,Σ) = D ∪ {dest(f1),


OutlineMotivation



The Chase




chase(D,Σ) = D ∪ {dest(f1), dest(f2),


OutlineMotivation



The Chase




chase(D,Σ) = D ∪ {dest(f1), dest(f2), dest(c1),


OutlineMotivation



The Chase




chase(D,Σ) = D ∪ {dest(f1), dest(f2), dest(c1), dest(c2),


OutlineMotivation



The Chase




chase(D,Σ) = D ∪ {dest(f1), dest(f2), dest(c1), dest(c2),dest(c3),


OutlineMotivation



The Chase




chase(D,Σ) = D ∪ {dest(f1), dest(f2), dest(c1), dest(c2),dest(c3), dest(b1),


OutlineMotivation



The Chase




chase(D,Σ) = D ∪ {dest(f1), dest(f2), dest(c1), dest(c2),dest(c3), dest(b1), hasAct(b1, z1), ...}


OutlineMotivation


ComponentsStrategies to Answer k-rank Disjunctive Atomic Queries

Group Preference ModelI A preference relation is a binary relation � ⊆ HPref ×HPref.

I A user preference model U induces a preference relation over asubset of HOnt, denoted �U ;

dest(f1)

dest(b1)

dest(c1)

dest(f2)dest(c3)

dest(c2)


OutlineMotivation



Group Preference Model

DefinitionA group preference model U = (U1, . . . ,Un) for n> 1 users is acollection of n user preference models.

dest(f1) dest(c3) dest(c2)

dest(c1) dest(b1) dest(f2)

u1

dest(c1)

dest(c3)

u2dest(f1)

dest(b1)dest(f2)

dest(c2) dest(f1)

u3dest(b1)

dest(c1)

dest(f2)dest(c3)

dest(c2)


OutlineMotivation



Probabilistic ModelI A preference relation � is score-based if is defined as follows:

a1 � a2 iff score(a1) > score(a2).

I Model assigns a probability to each atom (using e.g. Markov logicand Bayesian networks).

PrM0.4

0.34

0.3

0.8

0.75

0.6

dest(b1)

dest(c1)

dest(f1)

dest(c2)

dest(f2)

dest(c3)


OutlineMotivation



Challenges of the given model 1/2

u1



u2dest(f1)

dest(c1) dest(b1)dest(f2)

dest(c3) dest(c2) dest(f1)

u3dest(b1)

dest(c1)

dest(f2)dest(c3)

dest(c2)

0.8

0.75

0.6

dest(b1)

dest(c1)

dest(f1)

PrM0.4

0.34

0.3

dest(c2)

dest(f2)

dest(c3)

I Challenge 1: user preference model and the probabilistic model indisagreement: preference merging operators

DefinitionLet �U be an SPO and �M be a score-based preference relation. Apreference merging operator ⊗(�U ,�M) yields a relation �∗ such that

1. �∗ is an SPO2. if a1 �U a2 and a1 �M a2, then a1 �∗ a2.


OutlineMotivation



Challenges of the given model 2/2

u1



u2dest(f1)



u3dest(b1)

dest(c1)

dest(f2)dest(c3)

dest(c2)

0.8

0.75

0.6

dest(b1)

dest(c1)

dest(f1)

PrM0.4

0.34

0.3

dest(c2)

dest(f2)

dest(c3)

I Challenge 2: user preference models may be in disagreement witheach other: preference aggregation operator

DefinitionLet U = (U1, . . . ,Un) be a group preference model, where every Ui isan SPO. A preference aggregation operator

⊎on U yields an SPO �∗.


OutlineMotivation



GPP-Datalog+/– ontology - our model

DefinitionA GPP-Datalog+/– ontology has the form KB = (O,U ,M,⊗,

⊎)

I O is a Datalog+/– ontology

I U = (U1, . . . ,Un) is a group preference model with n> 1

I M is a probabilistic model

I ⊗ is a preference merging operator

I⊎

is the preference aggregation operator

We say that KB is a guarded iff O is guarded.


OutlineMotivation



Merging operator

Input: (�U), score-based �M over HOnt,t ∈ [0, 1]n (the influence of probabilistic model (0 - high)).Output: preference relation �∗⊆ HOnt ×HOnt.

1. Initialise G as an empty graph2. for every pair (a, b) ∈�U do3. if (score(b)− score(a) > t)4. add (b, a) edge to G5. else6. add (a, b) edge to G7. Return G


OutlineMotivation



Merging operatorI 0.4− 0.6 > 0.1 No =⇒ keep relation

dest(f1)


dest(c3) dest(c2)

0.8

0.75

0.6

dest(b1)

dest(c1)

dest(f1)

PrM0.4

0.34

0.3

dest(c2)

dest(f2)

dest(c3)

dest(c1) dest(b1)

dest(f1)


I 0.75− 0.6 > 0.1 Yes =⇒ inverse relation

dest(f1)


dest(c3) dest(c2)

0.8

0.75

0.6

dest(b1)

dest(c1)

dest(f1)

PrM0.4

0.34

0.3

dest(c2)

dest(f2)

dest(c3)

dest(c1) dest(b1)

dest(f1)



OutlineMotivation



Skyline and k-rank answerLet KB be a GPP-Datalog+/– ontology, Q(X) = q1(X1) ∨ · · · ∨ qn(Xn)be a DAQ. Then, a skyline answer to Q relative to �∗=

⊎(⊗(�U1 ,�M),

. . . ,⊗(�Un ,�M)) is any θqi entailed by O such that no θ′ exists withO |= θ′qj and θ′qj �∗ θqi , where θ and θ′ are ground substitutions forthe variables in Q(X).A k-rank answer to Q is a sequence S = 〈θ1, . . . θk′〉 built by subsequentlyappending the skyline answers to Q, removing these atoms fromconsideration, and repeating until either S = k or no more skyline answersto Q remain.

dest(f1)

dest(b1)

dest(c1)

dest(f2)dest(c3)

dest(c2)


OutlineMotivation



Strategies to answer k-rank DAQ

I Collapse to single user

1. Create virtual user2. Calculate k-rank from it

I Voting

1. Calculate k-rank for each of the users2. Vote


OutlineMotivation



Collapse to single user

Algorithm 1: AggPrefsCSU(�M ,�U1, . . . ,�Un , t)

Input: SPOs (�U1, . . . ,�Un ), score-based �M over HOnt,

and t = (t1, . . . , tn)∈ [0, 1]n.Output: preference relation �∗⊆ HOnt ×HOnt.

1. initialize G as an empty graph;2. add as nodes in G all elements appearing in the pref. relations �Ui

;3. for every user i ∈{1, . . . , n} do4. currUserMG = graph obtained by merging �Ui

with �M ;5. for every edge (s, t) in currUserMG do6. if there is no edge (s, t) in G then7. add edge (s, t) to G and label it with 1;8. if there is an edge (s, t) in G and it is labeled with n> 1 then9. increase the label of edge (s, t) in G by 1;10. if there is an edge (t, s) in G and it is labeled with 1 then11. remove edge (s, t) from G ;12. if there is an edge (t, s) in G and it is labeled with n > 1 then13. decrease the label of edge (t, s) in G by 1;14. return inducedPrefRelation(removeCycles(transitiveClosure(G))).


OutlineMotivation



Collapse to single userI no relation

dest(b1)

u1 t = 0

dest(f2)

dest(c3)

dest(c1) dest(c2)

dest(f1) u2 t = 0.1

dest(c1)

dest(f1)

dest(c2) dest(f2)

dest(b1)

dest(c3)

u3 t = 0.19

dest(b1)

dest(c1)

dest(f1)

dest(c3)

dest(c2)

dest(f2) dest(b1)

dest(c1)

dest(f2)

dest(c3)

dest(c2)

dest(f1)

1

11

1

1

1

1

3

22

11 1

2


OutlineMotivation



Collapse to single userI relation with weight 1

dest(b1)

u1 t = 0

dest(f2)

dest(c3)

dest(c1) dest(c2)

dest(f1) u2 t = 0.1

dest(c1)

dest(f1)

dest(c2) dest(f2)

dest(b1)

dest(c3)

u3 t = 0.19

dest(b1)

dest(c1)

dest(f1)

dest(c3)

dest(c2)

dest(f2) dest(b1)

dest(c1)

dest(f2)

dest(c3)

dest(c2)

dest(f1)

1

11

1

1

1

1

3

22

11 1

2


OutlineMotivation




dest(b1)

u1 t = 0

dest(f2)

dest(c3)

dest(c1) dest(c2)

dest(f1) u2 t = 0.1

dest(c1)

dest(f1)

dest(c2) dest(f2)

dest(b1)

dest(c3)

u3 t = 0.19

dest(b1)

dest(c1)

dest(f1)

dest(c3)

dest(c2)

dest(f2) dest(b1)

dest(c1)

dest(f2)

dest(c3)

dest(c2)

dest(f1)

1

11

1

1

1

1

3

22

11 1

2


OutlineMotivation



Collapse to single user: k-rankI Q = dest(X ), (t1, t2, t3) = (0, 0.1, 0.19), k = 1I k-rank answer to Q〈 dest(b1) 〉 .

dest(c1)

dest(f2)

dest(f1)

dest(c2)

dest(b1) 3

1

1

dest(c3) 1

1 1

2

12

12

11

1


OutlineMotivation



Collapse to single user: k-rankI Q = dest(X ), (t1, t2, t3) = (0, 0.1, 0.19), k = 2I k-rank answer to Q〈 dest(b1) , dest(c1) 〉 .

dest(c1)

dest(f2)

dest(f1)

dest(c2)

1

dest(c3) 1

1

2

12 11

1


OutlineMotivation



Collapse to single user: k-rankI Q = dest(X ), (t1, t2, t3) = (0, 0.1, 0.19), k = 3I k-rank answer to Q〈 dest(b1) , dest(c1) , dest(f1) 〉 .

dest(f2)

dest(f1)

dest(c2)

1

dest(c3) 1

11

1


OutlineMotivation



Collapse to single user: k-rankI Q = dest(X ), (t1, t2, t3) = (0, 0.1, 0.19), k = 4I k-rank answer to Q〈 dest(b1) , dest(c1) , dest(f1) , dest(f2) 〉 .

dest(f2)dest(c2)

dest(c3) 11


OutlineMotivation



Collapse to single user: k-rankI Q = dest(X ), (t1, t2, t3) = (0, 0.1, 0.19), k = 5I k-rank answer to Q〈 dest(b1) , dest(c1) , dest(f1) , dest(f2) , dest(c2) 〉 or〈 dest(b1) , dest(c1) , dest(f1) , dest(f2) , dest(c3) 〉 .

dest(c2)

dest(c3)


OutlineMotivation



Collapse to single user (Theorem)

TheoremLet KB = (O,U ,M,⊗,

⊎) be a GPP-Datalog+/– ontology, Q be a

DAQ, and k > 0. If O is a guarded Datalog+/– ontology and theremoveCycles subroutine does not remove any unnecessary edges, thenAlgorithm k-Rank-CSU

I correctly computes k-rank answers to Q

I Complexity: O(poly(|D|) · S + C ) time in the data complexity, whereS is the cost of computing score(a) = PrKB(a) for any atom a suchthat O |= a, and C is the cost of removeCycles.


OutlineMotivation



Voting - Plurality votingQ = dest(X ), k = 2, and (t1, t2, t3) = (0, 0.1, 0.19).

dest(f2) dest(c1) dest(c2) dest(b1) dest(c3)u1 1 1 1 1 0u2 0 1 0 1 0u3 1 0 0 1 1

Total 2 2 1 3 1

dest(b1)

u1 t = 0

dest(f2)

dest(c3)

dest(c1) dest(c2)

dest(f1)

u3 t = 0.19

dest(b1)

dest(c1)

dest(f1)

dest(c3)

dest(c2)

dest(f2)

u2 t = 0.1

dest(c1)

dest(f1)

dest(c2) dest(f2)

dest(b1)

dest(c3)

k-rank answer to Q using plurality voting is 〈dest(b1), dest(c1) 〉 or〈dest(b1), dest(f2) 〉.


OutlineMotivation



Voting - Least miseryQ = dest(X ), k = 2, and (t1, t2, t3) = (0, 0.1, 0.19). k-rank answer to Qusing least misery voting is 〈dest(b1), dest(c1) 〉. Eliminate dest(f2),dest(c2), dest(c3) since they are least preferred.

dest(b1) dest(c1)u1 1 1u2 1 1u3 1 0

Total 3 2

dest(b1)

u1 t = 0

dest(f2)

dest(c3)

dest(c1) dest(c2)

dest(f1)

u3 t = 0.19

dest(b1)

dest(c1)

dest(f1)

dest(c3)

dest(c2)

dest(f2)

u2 t = 0.1

dest(c1)

dest(f1)

dest(c2) dest(f2)

dest(b1)

dest(c3)


OutlineMotivation



Summary

I Extension of Datalog+/– that allows for dealing with both partiallyordered preferences of groups of users and probabilistic uncertainty.

I We have focused on answering DAQs (disjunctions of atomicqueries) k-rank queries in this context.

I Presented different operators to compute group preferences as amerging and an aggregation of the preferences of single users withprobability-based preferences and with each other, respectively.

I We have then provided algorithms to answer k-rank queriesfor DAQs under these group preferences.

I We have shown that, under certain reasonable conditions, such DAQanswering in Datalog+/– can be done in polynomial time in thedata complexity.


OutlineMotivation



Future work

I Implementing and testing the GPP-Datalog+/– framework.

I Explore which of the merging/aggregation operators is similar tohuman judgment and thus well-suited as a general defaultmerging/aggregation operator for search and query answering in theSocial Semantic Web.


OutlineMotivation



THANK YOU

Questions? [email protected]


Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences

Education

Transcript of Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences