Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

24
Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh May 21, 2008

description

Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania. Symposium on Database Provenance University of Edinburgh May 21, 2008. Provenance and Query Optimization. Many kinds of semiring-based provenance annotations to choose from: lineage why-provenance - PowerPoint PPT Presentation

Transcript of Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Page 1: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Containment of Conjunctive Queries on Annotated Relations

TJ Green University of Pennsylvania

Symposium on Database ProvenanceUniversity of Edinburgh

May 21, 2008

Page 2: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Provenance and Query Optimization

• Many kinds of semiring-based provenance annotations to choose from:– lineage– why-provenance– minimal witness why-provenance– provenance polynomials– ...

• These seem to keep track of more/less information

• A fundamental question: how does this affect query optimization?

2

Page 3: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Conjunctive Queries on K-Relations• Datalog-style syntax for conjunctive queries (CQs):

Q(x,y) :- R(x,z), R(z,y)• Semantics of applying the CQ to a K-relation R : D£D K:

Q(a,b) = z2D R(a,z)¢R(z,b)• # of repetitions of an atom in the body matters

• For unions of conjunctive quereis (UCQs) (equivalent to positive RA), sum over CQs:

P(x,y) :- R(x,z), R(z,y) P(x,y) :- R(x,w), R(y,w)• Semantics of UCQ applied to R ― a sum over CQs:

P(a,b) = z2D R(a,z)¢R(z,b) + w2D R(a,w)¢R(b,w)3

Page 4: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Choice of K Affects Query Optimization

K = N (bag semantics) differs from K = B (set semantics)e.g., the conjunctive queries

Q1(x) :- R(x,y), R(x,z) Q2(u) :- R(u,v)

are set-equivalent, but not bag-equivalent

4

Conjunctive Queries (CQs)

Unions of Conjunctive Queries (UCQs)

Bag Semantics Containment (vN)

? (¦2p-hard)

[Chaudhuri&Vardi 93]undecidable [Ioannidis&Ramakrishnan 95]

Bag Semantics Equivalence (´N)

isomorphism () [CV 93]

?

Page 5: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Our Contributions

• We make a systematic study of query containment and query equivalence for various provenance models

• We show that K-containment and K-equivalence of CQs and UCQs are decidable for lineage, why-provenace, and the provenance polynomials N[X], as well as a new model, B[X]

• The decision procedures are based on interesting variations of containment mappings

• We analyze the complexity in each case

5

Page 6: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Our Contributions

• As a corollary of the decidability result for N[X]-equivalence of UCQs, we also fill in a gap in the chart for bag semantics:

6

Conjunctive Queries (CQs)

Unions of Conjunctive Queries (UCQs)

Bag Semantics Containment (vN)

? (¦2p-hard)

[Chaudhuri&Vardi 93]undecidable [Ioannidis&Ramakrishnan 95]

Bag Semantics Equivalence (´N)

isomorphism () [CV 93]

isomorphism ()

Page 7: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

K-Containment for Queries

• For semiring K, define a ·K b , 9c . a + c = b. If ·K is a partial order, it is called the natural order, and K is said to be naturally-ordered

• B, N, lineage, why-provenance, B[X], and N[X] are all naturally-ordered

• We define K-containment using the natural order:

Q1 vK Q2 , 8I 8t Q1(I)(t) ·K Q2(I)(t)

Q1 ´K Q2 , 8I 8t Q1(I)(t) = Q2(I)(t) 7

Page 8: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

A Hierarchy of Semiring Provenance (1)

• Provenance polynomials (N[X], +, ¢, 0, 1) – tracks calculations abstractly; most general

e.g., 2p2r + 3ps + ps3

• Drop coefficients to get (B[X], +, ¢, 0, 1)p2r + ps + ps3

• Drop exponents to get why-prov. (P(P(X)), [, d, ;, {;}){{p,r}, {p,s}}

• Flatten set-of-sets to get lineage (P(X), +, ¢, ?, ;){p,r,s}

• Drop, flatten, etc. correspond to surjective semiring homomorphisms

8

Page 9: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

A Hierarchy of Semiring Provenance (2)

• Suppose h : K1 K2 is a semiring homomorphism. Then a ·K1

b implies h(a) ·K2 h(b). If h is also

surjective, then h(a) ·K2 h(b) implies a ·K1

b.

• Definition: K1 ¹ K2 means P vK2 Q implies P vK1

Q

• Proposition: for any positive KB ¹ K ¹ N[X]

(All those we consider are positive.) Moreover:• Proposition (Provenance Hierarchy):

B ¹ lineage ¹ Why-Prov. ¹ B[X] ¹ N[X] 9

Page 10: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Containment Mappings• A containment mapping from CQ Q to CQ P is a

function h : Vars(Q) Vars(P) such that– head of Q is mapped to head of P– every atom in body of Q is mapped to an atom in body

of P

• Theorem [CM77]: For CQs P,Q we have P vB Q iff there is a containment mapping from Q to P– e.g. Q1(x) :- R(x,y), R(x,z) Q2(u) :- R(u,v)– h which sends u x and v y is a containment

mapping• Checking for existence of containment mapping is

NP-complete 10

Page 11: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Canonical Databases

• Take body of CQ, “freeze” into database instance [CM77], and tag each tuple with a “tuple id”

• We’ll denote by canK(Q) the canonical database for Q with abstract tags from K

• e.g., Q(w) :- R(u,v), R(v,w)

u v x1

v w x2

canN[X](Q) = canB[X](Q) = R

u v {x1}

v w {x2}canlin(Q) = R

u v {{x1}}

v w {{x2}}canwhy(Q) = R

11

Page 12: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Lineage-Containment of CQs

• Covering set of containment mappings: for every atom A in the body of P there is a containment mapping h : Q P with A in the image of h

• Theorem: For CQs P, Q the following are equivalent:1. P vlin Q2. P(canlin(P)) µlin Q(canlin(P))3. there is a covering set of containment mappings from Q

to P• Note: covering sets of containment mappings were

identified in [CV 93] as a necessary (but not sufficient) condition for bag-containment of CQs

12

Page 13: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Why-Containment of CQs

• A containment mapping is onto if it induces a surjection on atoms

• Theorem: For CQs P, Q the following are equivalent:1. P vwhy Q2. P(canwhy(P)) µwhy Q(canwhy (P))3. there is an onto containment mapping h : Q P

• Note: onto containment mappings were identified in [CV 93] as a sufficient (but not necessary) condition for bag-containment of CQs

13

Page 14: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

B[X], N[X]-containment of CQs

• A containment mapping is exact if it induces a bijection on atoms

• Theorem: For CQs P, Q and for K 2 {B[X], N[X]} the following are equivalent1. P vK Q2. P(canK (P)) µK Q(canK (P))3. there is an exact containment mapping h : Q P

• Another way to think of exact containment mappings: by unifying variables in Q, you get a query isomorphic to P

14

Page 15: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

So Far

• K-containment of CQs is decidable for all the provenance models in the hierarchy

• Next, we indicate which steps in the hierarchy are strict, and which collapse:

B Á lineage Á Why-Prov. Á B[X] ¼ N[X]

15

Page 16: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Separating the Models for v of CQs

• B Á lineage:Q1(x,y) :- R(x,y), R(x,z) Q2(x,y) :- R(x,y)Q1 vB Q2 but Q1 vlin Q2

• lineage Á why:Q1(x) :- R(x,y), R(x,z) Q2(x) :- R(x,y)Q1 vlin Q2 but Q1 vwhy Q2

• why Á B[X]:Q1(x,y) :- R(x,y) Q2(x,y) :- R(x,y), R(x,z)Q1 vwhy Q2 but Q1 vB[X] Q2

16

Page 17: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

From Containment to Equivalence

• {Onto|exact} containment mappings in both directions implies CQs are isomorphic, so why-provenance, B[X], and N[X] collapse to:

P ´why Q , P ´B[X] Q , P ´N[X] Q , P Q

• In contrast, for lineage, having sets of covering containment mappings in both directions does not imply isomorphism (but still decidable)

17

Page 18: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

From CQs to UCQs

• For idempotent semirings (where + is idempotent) this is easy. B, PosBool(B), lineage, why-provenance, and B[X] are idempotent; N[X] is not (omitted)

• Proposition [after SY80]: If K is idempotent, then for UCQs P, Q we have P vK Q iff for every CQ P in P there is a CQ Q in Q such that P vK Q

• Corollary: For idempotent K, the problems of checking K-equivalence of CQs and K-equivalence of UCQs are polynomially equivalent

18

Page 19: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

N[X]- and Bag-Equivalence of UCQs

• As with CQs, N[X]-equivalence of UCQs turns out to be the same as isomorphism:Theorem: For UCQs P, Q, P ´N[X] Q iff P Q

• But, it turns out that N[X]-equivalence and N-equivalence of UCQs are intimately related:Theorem: for UCQs P, Q, P ´N[X] Q iff P ´N Q

Thus:Corollary: for UCQs P, Q P ´N Q iff P Q

19

Page 20: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

• Theorem: checking for {covering set of|onto|exact} containment mappings is NP-complete

• Checking for query isomorphism: believed >P, <NP

Summary: Complexity Results

20

B PosBool(B) N Lineage Why-Pr. B[X] N[X]

CQs vK NP [CM 77]

NP[PODS 07]

? (¦2p-hard)

[CV 93]NP-ct NP-ct NP-ct NP-ct

´K NP ibid.

NPibid.

ibid.

NP-ct

UCQs vK NP [SY 80]

NPibid.

undec [IR 95]

NP-ct NP-ct NP-ct PSPACE

´K NP ibid.

NPibid.

NP-ct NP-ct NP-ct

Page 21: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Summary: Provenance Hierarchy

21

B PosB.(B) Lineage N Why-Pr. B[X] N[X]

CQs vK ¼ Á Á Á Á ¼

´K ¼ Á Á ¼ ¼ ¼

B PosB.(B) Lineage Why-Pr. B[X] N[X]

UCQs vK ¼ Á Á Á Á

´K ¼ Á Á Á Á

Page 22: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Related Work

• Already mentioned– Set-cont. and equiv. of CQs [Chandra&Merlin 77]– Set-cont. and equiv. of UCQs [Sagiv&Yannakakis 80]– Bag-cont. of UCQs [Ioannidis&Ramakrishnan 95]– Bag-equiv. of CQs [Chaudhuri&Vardi 93]

• Containment of CQs with where-provenance [Tan 03]• Bag-set semantics [CV 93], combined semantics [Cohen 06]– For K-relations: support operator of [Geerts&Poggi 08]

generalizes duplicate elimination• Bag-containment of CQs [Jayram+ 06]

22

Page 23: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

Future Work• Loose ends:– Lower bound for N[X]-containment of UCQs (we gave only

a PSPACE upper bound)– Generalize results for specific semirings to semirings with

certain properties?• Beyond UCQs: Datalog– is K-containment of Datalog programs the same as set-

containment when K is a distributive lattice?– is bag-equivalence/N[X]-equivalence undecidable for

Datalog?• Could semiring framework give any insight into bag-

containment of CQs?• Query optimization for annotated XML

23

Page 24: Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania

24