Local as View: First steps

30
2005 lav-i 1 Local as View: First steps Introduction and an example Rewriting queries using views The Information Manifold system

description

Local as View: First steps. Introduction and an example Rewriting queries using views The Information Manifold system. Introduction and an example. LAV : local as view The sources are modeled as views, typically conjunctive, of the global (virtual) schema (why conjunctive?) - PowerPoint PPT Presentation

Transcript of Local as View: First steps

Page 1: Local as View:  First steps

2005 lav-i 1

Local as View: First steps

Introduction and an exampleRewriting queries using viewsThe Information Manifold system

Page 2: Local as View:  First steps

2005 lav-i 2

Introduction and an example

LAV: local as view

The sources are modeled as views, typically conjunctive, of the global (virtual) schema

(why conjunctive?)

Advantage: adding / removing a source are local to the source, and do not impact the global schema (if sufficiently general)

Page 3: Local as View:  First steps

2005 lav-i 3

A source is associated with a view definition; what are the assumptions on its contents?

Sound: contains a subset of the view definition

Complete: contains all the data in the definition (possibly more)

Exact: contains exactly the data in the view definition

The common assumption: sound views (fits the WWW environment)

Page 4: Local as View:  First steps

2005 lav-i 4

Answering a conjunctive query:

Rewrite the query in terms of the views

Equivalent rewriting: the rewriting is equivalent to the query

Contained rewriting: the rewriting is contained in the query

Maximally contained rewriting: • contained in query,

• not contained in another rewriting

For sound views, it is reasonable to search for maximally contained rewritings , then take their union

Page 5: Local as View:  First steps

2005 lav-i 5

Example: a university db (each attribute is explained when it first occurs)

course(c#,ti) //c-number, title

teaches(pr, c#, qu) // prof, quarter

registered(st, c#, qu) //student

major(st, dp) //dept

advises(pr, st)

Views:

v0(p, s):- advises(p,s)

v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97 //winter97

Page 6: Local as View:  First steps

2005 lav-i 6

Q: select a.pr, a.st, r.qu

from registered r, teaches t, advises a

where r.c# = t.c# and r.qu = t.qu and a.pr = t.pr a.st = r.st

and r.qu >=w98

As a conjunctive query : (with individual variables) jump

q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(p,s), q>=w98

Let Q’: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98

Expanding by the view defs:

Q’’: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), q>= w97 ,

advises(p,s), q>=w98

Q’’ is equivalent to Q, hence this is a good rewriting

Page 7: Local as View:  First steps

2005 lav-i 7

Assume v1 is replaced by

v2(s,q) :- registered(s,c,q), teaches(p,c,q), q>= w98

Can we answer the query?

The expansion of v2(s,q), v0(p,s):

q(p,s,q) :- registered(s,c,q), teaches(p’,c,q), q>= w98 ,

advises(p,s),

We use p’ (rather than p), since it is an existential variable of v1

This is not equivalent to the query, nor contained in it

Assume q was dropped from both v1 and Q, could we answer Q using v0 and v1?

Page 8: Local as View:  First steps

2005 lav-i 8

Assume v1 is replaced by v3:

select r.st, t.pr, r.qu

from registered r, teaches t

where r.c# = t.c# and r.qu >= win98

Convert to conjunctive!

Can we answer the query?

Interim summary:

To be useable in a query rewriting, views have to

• export variables that are subject to (arithmetic or join) conditions in the body of the query , or to contain such joins in their bodies,

• treat query head variables as head variables of the view

Page 9: Local as View:  First steps

2005 lav-i 9

Rewriting queries using views

Scenario:

We have a collection of view definitions

V = {v1, v2, …, vn}

Given a query Q, we ask:

• is there a rewriting that uses the views?

• If there is, how can it be computed?

Applications:

• Query optimization by using materialized views

• Data integration

Page 10: Local as View:  First steps

2005 lav-i 10

Example :

Q: q(X, U) :- p(X, Y), r(Y, Z), s(X, W), t(W, U)

V: v1(A,B) :- p(A,C), r(C,B), s(A, D)

A partial rewriting:

Q’: q(X, U) :- v1(X, Z), s(X, W), t(W, U)

The atom s(A, D) in v1 does not replace s(X, W) in Q (and Q’)

Given also v2(A,B) :- s(A, C), t(C, B), r(D, E)

Q’’: q(X, U) :- v1(X, Z), v2(X, U)

Is a complete equivalent rewriting

Note: looking at Q’, it is not evident that by replacing the last two atoms with v2 we obtain an equivalent rewriting; v2 contains r(D, E), and Q’ does not contain r.

Page 11: Local as View:  First steps

2005 lav-i 11

Classification:

• Partial rewriting: only part of the query is replaced by views

useful in query optimization, not in data integration

• Complete rewriting: only views occur in it

useful in both scenarios

• Equivalent rewriting: yields the same answer on all db’s

useful in both cases

• Contained rewriting: its expansion is a contained query

not interesting for query optimization, useful for integration

• Maximal contained rewriting: contained in query, not in another rewriting

We assume (unless stated otherwise) conjunctive queries and views, no b.i. preds

Page 12: Local as View:  First steps

2005 lav-i 12

Assume a query Q q(..):- Qbody is given.

A view is usable in an equivalent rewriting if it occurs in some partial equivalent rewriting of Q

Claim 1 : jump

That is, if v(D) is empty, for some db D, then so is Q(D)

Proof:

If the containment does not hold, then there is a D s.t. v(D) is empty, Q(D) is not, so the rewriting is not equivalent

If it holds, then Q’: q(..):- Qbody, v(Y) (Y new vars) is an equivalent partial rewriting

(bold letters – vectors of variables/constants)

(in equiv. rewriting)

(the projection of Q on the empty set of attributes is contained in that of v)

v is usable iff (Q) (v) Æ ÆP Í P

Page 13: Local as View:  First steps

2005 lav-i 13

Corollary: checking for usability is NP-complete

Corollary : if v1,…, vk are usable, then

Q’: q(..):- Qbody, v1(Y1), …, vk(Yk) (Yi new and distinct) is an equivalent rewriting

Apply the containment map hi to each v(Yi), obtaining Q’’ that is contained in Q’ (and does not introduce any new vars)

Q’’ contains Q, hence Q, Q’ Q’’ are equivalent

Q’’ still contains all atoms of bodyQ; some of these may be removed by standard minimization; then some views may be removed

How many must be left?

(Q) (v):

iff there is a containment mapping from body(v) to body(Q)Æ ÆP Í P

Page 14: Local as View:  First steps

2005 lav-i 14

Claim 2 : jump

If Qbody has n atoms, then a minimized, equivalent or contained, rewriting Q’ contains at most n (view or regular) atoms

Proof: consider the expansion exp(Q’), and the containment mapping from Q to exp(Q’); its image has at most n atoms

Q: q(..) :- p1(..), pk(..) pk+1(..) pn(..)

Q’: q(..):- p1,(..), , pk(..), v1(..), vm(..)

r1,1 .. r1k1 rm,1 .. rmk1

Page 15: Local as View:  First steps

2005 lav-i 15

Claim 3: The problem:

is there a complete equivalent rewriting

is NP-complete

Proof: given v, construct

v’: Qhead(..):- Qbody, vbody

Then v’ is usable (NP-complete) iff v’ is a complete rewriting

Claim 4:

The problem above is NP-complete even if the query and the views do not contain repeated predicates in their bodies (which simplifies the search for containment mappings)

And also if we are looking for a complete contained rewriting

Proof: next page

Page 16: Local as View:  First steps

2005 lav-i 16

Exact cover by 3-sets : (NP-complete problem)

Given s={e1,…,en} and sets s1,…,sk, each of size 3, is there a cover of s by a subset of the {sj} where each element occurs in just one set?

A reduction from the above to finding a complete rewriting:

The query Q: q( ) :- p1(A1,1,…,A1,k),…, pn(An,1,…, An,k)

Note: since each sj contains 3 elements, Sj occurs in 3 atoms – this is a join condition in the query body

The view vj (for sj): the 3 atoms (as above) for the elements of sj, and a head that contain all the Sm that occur in its body, but not Sj (in the body, Sj is existential)

In pi, Ai,j is Sj if ei sj, a new variable otherwiseÎ

Page 17: Local as View:  First steps

2005 lav-i 17

Assume a complete contained rewriting Q’ exists

There is a containment mapping h : body(Q) body( exp(Q’))

Note: Sj in exp(vj) is renamed to a new var, say Gj, that does not occur in

expansion of other views, since it is existential

If h maps pi(..) from Q to the expansion of vj, then the expansion contain pi, so and h(Sj)=Gj

the atoms pm, pq in Q contain Sj

And the body of vj contains pi, pm, pq, with Gj instead of Sj

Since h(Sj) = Gj, h maps pm and pq from Q also to exp(vj)(Since vj does not export Sj, the join condition on Sj can be satisfied only if

all three atoms pi, pm, pq are mapped by h to vj)

The views with expansions in image(h) provide an exact cover

ei sjÎ

Assume sj={ei, em, eq}

Page 18: Local as View:  First steps

2005 lav-i 18

The other direction (if an exact cover exists, it gives a complete rewriting) - left for you

Also: a complete contained rewriting in this case is always an equivalent rewriting

Page 19: Local as View:  First steps

2005 lav-i 19

Comments:• For contained rewritings, the characterization (claim1, p.12)

of usability does not hold (even if v(D) is empty on some D, and Q(D) is not, v may be useful for a contained rewriting)

• Claim 2 (p. 14) holds also for contained rewritings : search for rewritings is restricted by size of query body

but: if db satisfies functional dependencies, the size bound fails

Example:

Database: a single relation e(X, Y, Z), with fd: X Y

Query Q: q(X, Y, Z):- e(X, Y, Z)

Views: v1(X,Y):- e(X, Y, Z), v2(X,Z):- e(X, Y, Z)

A complete rewriting, of minimal size :

Q’: q(X, Y, Z):- v1(X, Y), v2(X, Z)

Page 20: Local as View:  First steps

2005 lav-i 20

If the query and views contain b.i. predicates (comparisons) :

Let Q’ be Q w/o the comparisons

If Q’’ is a complete rewriting for Q’, just add to it the comparisons in Q (with variables suitably renamed), to obtain a complete contained rewriting for Q (or a contradiction!)

Example: from p. 6

Views: v0(p, s):- advises(p,s)

v1(p,s, q) :- registered(s,c,q), teaches(p,c,q), q>= w97

A query: q(p,s,q) :- registered(s,c,q), teaches(p,c,q), advises(q,s), q>=w98

A rewriting: q(p,s,q):- v1(p.s, q), v0(p,s), q>=w98

Can we change the example to obtain a contradiction?

Page 21: Local as View:  First steps

2005 lav-i 21

The Information Manifold system (IM)

A LAV system, implemented in Bell Labs around 94-96, supported about 100 WWW sources

Main ideas:

• Rewriting queries using views for answering queries

• Using fine-grained descriptions of sources to eliminate irrelevant sources

• Support for restricted-capability sources

From now, rewriting always means complete, contained

Page 22: Local as View:  First steps

2005 lav-i 22

Rewriting views using views in IM – the bucket algorithm :

The goal: Finding rewritings is difficult in worst case

But good heuristics often work well in practice

Outline:

1) Find for each query atom – a subgoal, views that may be targets of a mapping from it; put them in a bucket for the subgoal (these are candidates) (hopefully, this filters out many candidates)

2) Combine views, one from each bucket; test if

1) there is a containment mapping;

2) adding constraints yields a satisfiable query on the views

3) Minimize each rewriting, eliminate those contained in others

4) Take the union

Page 23: Local as View:  First steps

2005 lav-i 23

Step 1 – computing the bucket for a query atom :

Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q)

We add h(vj) to bucket(pi(Ui)) if:

• vj’s definition contains an atom pi(Y) ; // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y)

• if the k’th var of Ui is a head var of Q, then the k’th var of Y is a head var of vj (this var is needed for the query result)

• h renames variables of vj and (some vars of) Q as follows:• If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui

// this may be a head var of Q, or used in a join

• Otherwise, h(y) is a new distinct var

• h(pi(Ui)) = h(pi(Y))

• h(C(Q)) and h(C(vj)) is satisfiable

Page 24: Local as View:  First steps

2005 lav-i 24

Example (same university db) :

Views:

v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98

v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q)

v3(s, c) :- registered(s, c, q), q<=a94

v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97

Query: q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95

Bucket for registered(s, c, q): v1(s, c, q, t1), v2(s, p1, c, q) (but not v3 – for one(two?) reasons, not v4 – for one reason)

Bucket for teaches(p, c, q): v2(s1, p, c, q), v4(p, c, t2, q)

Bucket for course(c, t):

Page 25: Local as View:  First steps

2005 lav-i 25

Step 2 – combining views, testing for satisfiability of rewriting :

Example (cont’d): Combining 1st element of each bucket:

Q1: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q), v1(s2, c, q1, t)

Minimize by s2s, cc, q1q, t t1 (3rd atom is removed)

Q1’: q(s, p, c) :- v1(s, c, q, t1), v2(s1, p, c, q)

Expand:v1: registered(s, c, q), course(c, t1), c>=500, q>=a98,

v2: registered(s, c, q), teaches(p, c, q)

With query constraints (under containment mapping) c>=300, q>=a95, this is satisfiable, so we have a contained rewriting

Page 26: Local as View:  First steps

2005 lav-i 26

Another combination:

Q2: q(s, p, c) :- v1(s, c, q, t1), v4(p, c, t2, q) , v4(p2, c, t, q2)

Can minimize, remove 3rd atom

Expansion:

v1: registered(s, c, q), course(c, t1), c>=500, q>=a98

v4 : registered(s, c, q), teaches(p, c, q), course(c, t2), q<=a97

The conjunction is unsatisfiable (conditions in two views are contradictory), this is not a rewriting

Taking the union of all the rewritings that pass the filters, we obtain (in this example) a maximally contained rewriting

Page 27: Local as View:  First steps

2005 lav-i 27

Another example:

The database contains a single relation :

flight(from, to, carrier)

The views are

v1(F, T) :- flight(F, T, wn) // wn is Southwest airlines

v2(F, T) :- flight (F, T, ua) // United airlines

v3(F, T, C) :- flight(F, Z, C), flight(Z, T, C)

A user wants to fly from Tucson to S.F, w/o changing the airline, and with at most one stop

Q: q(C) :- flight(tus, sfo, C)

q(C) :- flight(tus, Z, C), flight(Z, sfo, C)

Page 28: Local as View:  First steps

2005 lav-i 28

Buckets are computed for each sub-query: (an extension!)

For flight(tsu, sfo, C):

v1(tus, sfo), v2(tus, sfo), v3(tus, T1, C), v3(F1, sfo, C)

Each of these gives a candidate rewriting for q, of which only

q1(wn) :- v1(tus, sfo)

q2(ua) :- v2(tus, sfo)

remain

Page 29: Local as View:  First steps

2005 lav-i 29

An alternative presentation of the bucket (p. 23)

Step 1 – computing the bucket for a query atom :

Assume Q: q(X) :- p1(U1), …, pn(Un), C(Q)

We add h(vj) to bucket(pi(Ui)) if:

• vj’s definition contains an atom pi(Y) & pi(U) and pi(Y) unify // if vj is used, then possibly a containment mapping sends pi(Ui) to pi(Y)

• Some more conditions (next page) are satisfied

Thus, if vj’s body contains pi twice, it may be added to the bucket (at most) twice

For h, see next page

Page 30: Local as View:  First steps

2005 lav-i 30

The additional conditions & the definition of h:• (condition) if the k’th var of Ui is a head var of Q, or a join var of

Q, then the k’th var of Y is a head var of vj (this var is needed for the query result or for the join)

(the condition on join vars does not occur in Alon’s survey paper!?)

• h renames variables of vj as follows:• If y, k’th in Y, is a head var of vj, then rename to k’th var of Ui

• Otherwise, h(y) is a new distinct var

• h(pi(Ui)) = h(pi(Y))

this is simply a renaming of the variables of vj• (condition) Since pi(Y) and pi(U) are unifiable, h can be extended

to (some variables of) Q, so that h(pi(Ui)) = h(pi(Y))

• (condition) h(C(Q)) and h(C(vj)) is satisfiable (h is defined on some variables of C(Q), so it maps part of C(Q); that part

should be consistent with h(C(vj)) )