Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences

OutlineIntroductionDatalog+/–

GPP-Datalog+/–

Query Answering in Probabilistic Datalog+/–Ontologies under Group Preferences

Thomas Lukasiewicz, Maria Vanina Martinez,Gerardo I. Simari, and Oana Tifrea-Marciuska

Department of Computer Science, University of Oxford, UK

July 5, 2013

Oana Tifrea-Marciuska Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences slide 1 /27


GPP-Datalog+/–

Introduction

Datalog+/–Databases and QueriesThe Chase

GPP-Datalog+/–Group Preference ModelProbabilistic ModelPreference Merging and AggregationStrategies to Answer k-rank Disjunctive Atomic Queries



GPP-Datalog+/–

Motivation◮ Web → Social Semantic Web◮ model group of users (e.g., movie night, trip) that can handle

◮ qualitative preferences of users

◮ disagreement between users

◮ efficiency

◮ model uncertainty (e.g., information integration from travel sites)◮ Desire: ontology language that handles preferences of a group of

users and can handle uncertainty

1

1image source: www.boundless.comOana Tifrea-Marciuska Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences slide 3 /27


GPP-Datalog+/–

Motivation◮ Web → Social Semantic Web◮ model group of users (e.g., movie night, trip) that can handle

◮ qualitative preferences of users

◮ disagreement between users

◮ efficiency

our previous work in SUM2013

◮ model uncertainty (e.g., information integration from travel sites)◮ Desire: ontology language that handles preferences of a group of

users and can handle uncertainty

1

1image source: www.boundless.comOana Tifrea-Marciuska Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences slide 3 /27


GPP-Datalog+/–

Databases and QueriesThe Chase

Datalog+/– (1/3)

◮ A database (instance) D for R is a (possibly infinite) set of atomswith predicates from a finite set of predicate symbols R andarguments from a set of data constants ∆.

D = {sport(s1), sport(s2), relax(r1), relax(r2), adv(a1), adv(a2),museum(m1), museum(m2), park(p1), free entrance(p1)}.

◮ A conjunctive query (CQ) over R has the form Q(X) = ∃YΦ(X,Y),where Φ(X,Y) is a conjunction of atoms

Q(X ) = park(X ) ∧ free entrance(X ).

◮ A Boolean CQ (BCQ) over R is a CQ of the form Q(), often writtenas the set of all its atoms, without quantifiers.

Q() = ∃Xpark(X ) ∧ free entrance(X ).



GPP-Datalog+/–


Datalog+/– (2/3)◮ Answers to CQs and BCQs are defined via homomorphisms, which

are mappings µ : ∆ ∪∆N ∪ V → ∆ ∪∆N ∪ V such that

1. c ∈ ∆ (set of constants) implies µ(c) = c,2. c ∈ ∆N (set of labelled nulls) implies µ(c) ∈ ∆ ∪∆N ,3. µ is naturally extended to atomic formula, sets of atomic formulas,

and conjunctions of atomic formulas.

◮ The set of all answers Q(D) is the set of all tuples t over a set ofdata constants s.t. ∃µ µ : X∪Y→∆∪∆N s.t. µ(Φ(X,Y))⊆D andµ(X)= t.

D = {sport(s1), sport(s2), relax(r1), relax(r2), adv(a1), adv(a2),museum(m1), museum(m2), park(p1), free entrance(p1)}.

For Q(X ) = park(X ) ∧ free entrance(X )

the set of all answers over D is Q(D) = {p1}.

For Q() = ∃Xpark(X ) ∧ free entrance(X )

the answer is YES.



GPP-Datalog+/–


Datalog+/– (3/3)

◮ Tuple-generating dependency (TGD): constraint of the form∀X∀YΦ(X, Y)→∃ZΨ(X,Z), where Φ(X,Y) and Ψ(X, Z) areconjunctions of atoms over a set of predicates R, called the bodyand the head, respectively.

museum(X ) → SS(X )

◮ For a database D for R, and a set of TGDs Σ on R, the set ofmodels of D and Σ, denoted mods(D,Σ), is the set of all (possiblyinfinite) databases B such that

◮ D ⊆B and◮ every σ ∈Σ is satisfied in B.

◮ The set of answers for a CQ Q to D and Σ, denoted ans(Q,D,Σ),is the set of all tuples a such that a ∈ Q(B) for all B ∈mods(D,Σ).



GPP-Datalog+/–


◮ The chase is a procedure for repairing a DB relative to a set ofdependencies.

◮ D ∪Σ |= Q iff chase(D,Σ) |= Q.

◮ A TGD σ is guarded iff it contains an atom in its body that containsall universally quantified variables of σ.

σ1 : P(X ) ∧ R(X ,Y ) ∧ Q(Y ) → ∃R(Y ,Z) YES.σ2 : R(X ,Y ) ∧ R(Y ,Z) → R(X ,Z) NO.If Σ consists of guarded TGDs, CQs can be evaluated on a fragmentof constant depth k ∗ |Q|, PTIME in data complexity.



GPP-Datalog+/–


TGD ChaseInformally, a TGD σ is applicable in a DB D if body(σ) maps to atoms inD. If not already in D, the application of σ on D adds an atom with freshnulls corresponding to each existentially quantified variable in head(σ).Example. Let O = (D,Σ) be an ontology describing travel activities:

Σ = {museum(X ) → SS(X ), park(A) → SS(A),SS(A) → act(A), relax(X ) → act(X ),adv(X ) → act(X ), sport(X ) → act(X ),adv(X ) → ∃Y requireEquip(X ,Y )};

D = {sport(s1), sport(s2), relax(r1), relax(r2), adv(a1),adv(a2), museum(m1), museum(m2), park(p1)}

chase(D,Σ) = D ∪ {SS(m1),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1), act(a2),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1), act(a2), act(s1),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1), act(a2), act(s1),act(s2),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1), act(a2), act(s1),act(s2), act(m1),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1), act(a2), act(s1),act(s2), act(m1), act(m2),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1), act(a2), act(s1),act(s2), act(m1), act(m2), act(p1),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1), act(a2), act(s1),act(s2), act(m1), act(m2), act(p1),requireEquip(a1, e1),



GPP-Datalog+/–





chase(D,Σ) = D ∪ {SS(m1), SS(m2), SS(p1), act(r1),act(r2), act(a1), act(a2), act(s1),act(s2), act(m1), act(m2), act(p1),requireEquip(a1, e1), requireEquip(a2, e2)}



GPP-Datalog+/–

Group Preference ModelProbabilistic ModelPreference Merging and AggregationStrategies to Answer k-rank Disjunctive Atomic Queries

◮ A preference relation is a binary relation ≻ ⊆ HPref ×HPref.

◮ A user preference model U induces a preference relation over asubset of HOnt, denoted ≻U ;

act(s1)

act(s2) act(a2) act(a1)

act(p1) act(m1) act(m2)

act(r1) act(r2)



GPP-Datalog+/–


DefinitionA group preference model U =(U1, . . . ,Un) for n> 1 users is acollection of n user preference models.

u1 u2

u3

act(s1)



act(r1) act(r2)

act(s2)


act(s1) act(r1) act(r2)

act(a2) act(a1)

act(m1) act(m2)

act(p1) act(r1) act(r2)

act(s2) act(a2) act(a1) act(s1)



GPP-Datalog+/–


Probabilistic Model◮ A preference relation ≻ is score-based if is defined as follows:

a1 ≻ a2 iff score(a1) > score(a2).◮ Model assigns a probability to each atom (using e.g. Markov logic

and Bayesian networks).

0.80.44

0.75

0.6

0.52

0.4

0.34

0.3

0.1

PrMact(m1)

act(p1)

act(s2)

act(m2)

act(r1)

act(a1)

act(r2)

act(a2)

act(s1)



GPP-Datalog+/–


Challenges of the given model

0.80.44

0.75

0.6

0.52

0.4

0.34

0.3

0.1

u1 u2

u3PrM

act(s1)



act(r1) act(r2)

act(s2)


act(s1) act(r1) act(r2)

act(a2) act(a1)

act(m1) act(m2)

act(p1) act(r1) act(r2)

act(s2) act(a2) act(a1) act(s1)

act(m1)

act(p1)

act(s2)

act(m2)

act(r1)

act(a1)

act(r2)

act(a2)

act(s1)



GPP-Datalog+/–


Preference Merging and Aggregation.

◮ Challenge 1: user preference model and the probabilistic model indisagreement: preference merging operators

◮ Challenge 2: user preference models may be in disagreement witheach other: preference aggregation operator

DefinitionLet ≻U be an SPO and ≻M be a score-based preference relation. Apreference merging operator ⊗(≻U ,≻M) yields a relation ≻∗ such that

1. ≻∗ is an SPO

2. if a1 ≻U a2 and a1 ≻M a2, then a1 ≻∗ a2.

DefinitionLet U =(U1, . . . ,Un) be a group preference model, where every Ui isan SPO. A preference aggregation operator

⊎

on U yields an SPO ≻∗.



GPP-Datalog+/–


DefinitionA GPP-Datalog+/– ontology has the form KB =(O,U ,M,⊗,

⊎

),where

◮ O is a Datalog+/– ontology

◮ U =(U1, . . . ,Un) is a group preference model with n> 1

◮ M is a probabilistic model (with Herbrand bases HOnt, HPref,and HM, respectively, such that HPref ⊆ HOnt)

◮ ⊗ is a preference merging operator

◮

⊎

is the preference aggregation operator

We say that KB is a guarded iff O is guarded.



GPP-Datalog+/–


DAQ queries

DefinitionLet KB =(O,U ,M,⊗,

⊎

) be a GPP-Datalog+/– ontology, whereU = (U1, . . . ,Un), and Q(X)= q1(X1) ∨ · · · ∨ qn(Xn) be a DAQ. Then, askyline answer to Q relative to ≻∗ =

⊎

(⊗(≻U1,≻M), . . . ,⊗(≻Un

,≻M)) isany θqi entailed by O such that no θ′ exists with O |= θ′qj andθ′qj ≻

∗ θqi , where θ and θ′ are ground substitutions for the variables inQ(X).

A substitution is a mapping from variables to variables or constants.

◮ Intuitively, a skyline-answer is an answer that is the most preferred.

◮ A 1-rank answer is a skyline answer

◮ A 2-rank answer is the first and second most preferred answers.



GPP-Datalog+/–


Collapse to single user◮ t ∈ [0, 1]: the influence of probabilistic model (0 - high)◮ 0.34− 0.6 > 0.1 No =⇒ keep relation

0.80.44

0.75

0.6

0.52

0.4

0.34

0.3

0.1

PrM

act(m1)

act(p1)

act(s2)

act(m2)

act(r1)

act(a1)

act(r2)

act(a2)

act(s1)

act(s1)

act(m1)

act(m2)

act(p1)

act(r1)act(r2)

act(s2)

act(a2)act(a1)

t=0.1

u2

act(m1)act(m2)act(p1)

act(r1) act(r2)

act(s2)

act(a2)

act(a1)

act(s1)



GPP-Datalog+/–


Collapse to single user◮ 0.75− 0.6 > 0.1 Yes =⇒ inverse relation

0.80.44

0.75

0.6

0.52

0.4

0.34

0.3

0.1

PrM

act(m1)

act(p1)

act(s2)

act(m2)

act(r1)

act(a1)

act(r2)

act(a2)

act(s1)

act(s1)

act(m1)

act(m2)

act(p1)

act(r1)act(r2)

act(s2)

act(a2)act(a1)

t=0.1

u2

act(m1)act(m2)act(p1)

act(r1) act(r2)

act(s2)

act(a2)

act(a1)

act(s1)



GPP-Datalog+/–


Collapse to single user

◮ no relation

u2 t = 0.1u3 t = 0.3

act(s1)

act(m1)

u1 t = 0

act(r2)

act(a2)

act(m2)

act(r1)

act(p1) act(a1) act(s2)

act(m1)

act(p1)

act(s2)

act(m2)

act(r1) act(r2)

act(a2) act(a1) act(s1)

act(p1)

act(s2)

act(a2)

act(s1)

act(m1)

act(a1)

act(m2)

act(r2)

act(r1)



GPP-Datalog+/–



◮ relation with weight 1

u2 t = 0.1u3 t = 0.3

act(s1)

act(m1)

u1 t = 0

act(r2)

act(a2)

act(m2)

act(r1)


act(m1)

act(p1)

act(s2)

act(m2)

act(r1) act(r2)


act(p1)

act(s2)

act(a2)

act(s1)

act(m1)

act(a1)

act(m2)

act(r2)

act(r1)



GPP-Datalog+/–




u2 t = 0.1u3 t = 0.3

act(s1)

act(m1)

u1 t = 0

act(r2)

act(a2)

act(m2)

act(r1)


act(m1)

act(p1)

act(s2)

act(m2)

act(r1) act(r2)


act(p1)

act(s2)

act(a2)

act(s1)

act(m1)

act(a1)

act(m2)

act(r2)

act(r1)



GPP-Datalog+/–


Collapse to single user (Final Graph)◮ Q = act(X ), (t1, t2, t3) = (0, 0.1, 0.3), k = 1◮ k-rank answer to Q 〈 act(m1) 〉.

act(m2)

act(r2)

act(a2)

act(m1)

act(p1)

act(s2)

act(r1)

act(s1)

act(a1)

1

2

1

2

3

1

12

2 2

2

12

2

1

21

2

1

1

2

2

1

22

2



GPP-Datalog+/–


Collapse to single user (Final Graph)◮ Q = act(X ), (t1, t2, t3) = (0, 0.1, 0.3), k = 2◮ k-rank answer to Q 〈 act(m1) , act(p1) 〉.

act(m2)

act(r2)

act(a2)

act(p1)

act(s2)

act(r1)

act(s1)

act(a1)

2

1

2

3

1

1

12

2

1

21

2

1

1

2

2

1

22

2



GPP-Datalog+/–


Collapse to single user (Final Graph)◮ Q = act(X ), (t1, t2, t3) = (0, 0.1, 0.3), k = 3◮ k-rank answer to Q 〈 act(m1) , act(p1) , act(s2) 〉.

act(a2)

act(s1)

act(a1)1

11

2

1

2

22

2act(r1)

1

2

1act(r2)

3

2act(m2)

2

1

1

1act(s2)



GPP-Datalog+/–


Collapse to single user (Final Graph)◮ Q = act(X ), (t1, t2, t3) = (0, 0.1, 0.3), k = 4◮ k-rank answer to Q 〈 act(m1) , act(p1) , act(s2) , act(m2) 〉.

act(a2)

act(s1)

act(a1)1

11

22

22

2act(r1)

1

2

1act(r2)

3

2act(m2)



GPP-Datalog+/–


Collapse to single user (Final Graph)◮ Q = act(X ), (t1, t2, t3) = (0, 0.1, 0.3), k = 5◮ k-rank answer to Q 〈 act(m1) , act(p1) , act(s2) , act(m2) , act(r2)〉.

act(a2)

act(s1)

act(a1)1

1

2

22

act(r1)

1

2

1act(r2)



GPP-Datalog+/–


Collapse to single user (Theorem)

TheoremLet KB = (O,U ,M,⊗,

⊎

) be a GPP-Datalog+/– ontology, Q be aDAQ, and k > 0. If O is a guarded Datalog+/– ontology and theremoveCycles subroutine does not remove any unnecessary edges, thenAlgorithm k-Rank-CSU

◮ correctly computes k-rank answers to Q

◮ Complexity: O(poly(|D|) · S +C ) time in the data complexity, whereS is the cost of computing score(a)= PrKB(a) for any atom a suchthat O |= a, and C is the cost of removeCycles.



GPP-Datalog+/–


Voting◮ different voting strategies e.g.,plurality voting◮ that is computing the individual rankings first and then voting

Q = act(X ), k =2, and (t1, t2, t3) = (0, 0.1, 0.3). k-rank answer to Qusing voting is 〈act(m1), act(m2) 〉 or 〈act(m1), act(p1) 〉.

act(s1)

act(m1)

u2

u1

u3

act(r2)

act(a2)

act(m2)

act(r1)


act(m1)

act(p1)

act(s2)

act(m2)

act(r1) act(r2)


act(p1)

act(s2)

act(a2)

act(s1)

act(m1)

act(a1)

act(m2)

act(r2)

act(r1)



GPP-Datalog+/–


Summary

◮ Extension of Datalog+/– that allows for dealing with both partiallyordered preferences of groups of users and probabilistic uncertainty.

◮ We have focused on answering DAQs (disjunctions of atomicqueries) k-rank queries in this context.

◮ Presented different operators to compute group preferences as amerging and an aggregation of the preferences of single users withprobability-based preferences and with each other, respectively.

◮ We have then provided algorithms to answer k-rank queriesfor DAQs under these group preferences.

◮ We have shown that, under certain reasonable conditions, such DAQanswering in Datalog+/– can be done in polynomial time in thedata complexity.



GPP-Datalog+/–


Future work

◮ Implementing and testing the GPP-Datalog+/– framework.

◮ Explore which of the merging/aggregation operators is similar tohuman judgment and thus well-suited as a general defaultmerging/aggregation operator for search and query answering in theSocial Semantic Web.



GPP-Datalog+/–


THANK YOU

Questions? [email protected]


Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences

Education

Transcript of Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences