CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets
description
Transcript of CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets
CS240A: Databases and Knowledge Bases
From Differential Fixpoints to Magic Sets
Carlo Zaniolo
Department of Computer Science
University of California, Los Angeles
January, 2002
Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997
Recursive Predicates
r1: anc(X, Y) parent(X, Y).
r2: anc(X, Z) anc(X,Y), parent(Y,Z).
r2 is a recursive rule---a left linear one
r1 is the a nonrecursive rule defining a recursive
predicate—this is called an exit rule.
An alternative definition for anc:
r3: anc(X, Y) parent(X, Y).
r4: anc(X, Z) anc(X,Y), anc(Y,Z).
Here r4 is a quadratic rule.
Fixpoint Computation
The inflationary immediate consequence operator for P:
P (I) = TP (I) I
We have: P
n () = TPn ()
lfp(TP) = TP () = lfp(P) = P
()
Fixpoint Computation (cont.)
Naïve Fixpoint Algorithm for P (M = for now
{S : = M ; S: = P(M)
while S S{ S : = S;
S: = P(S) } }
We can replace the first P with E and the second
one with R respectively denoting the immediate
consequence operators for the exit rules and the recursive ones.
Differential Fixpoint (a.k.a. Seminaive Computation)
Redundant Computation: the jth iteration step also re-computes all atoms obtained in the (j – 1)th step. Finite differences techniques tracing the derivations over two steps:
1. S the set of atoms obtained up to step j-1
2. S’ the set of atoms obtained up to step j
3. S = R (S) - S = TR (S) - S denotes the new atoms at
step j (i.e., the atoms that were not in S at step j-1)
4. S = R (S) - S = TR (S) - S are the new atoms
obtained at step j+1.
Differential Fixpoint Algorithm(M = for now
{S := M; S := TE(M);
S:= S S; while S
{ S := TR(S) - S;
S := S ; S := S ;
S:= S S } }
anc, anc, and anc, respectively, denote ancestor atoms that are in S, S, and S = S S.
Rule Differentiation
To compute S: = TR ( S) - S we can use a TR defined by
the following rule:
anc(X, Z) anc(X,Y), parent(Y,Z). This can be rewritten as:
anc(X, Z) anc(X,Y), parent(Y,Z). anc(X, Z) anc(X,Y), parent(Y,Z).
The second rule can now be eliminated, since it produces only atoms that were already contained in anc, i.e., in the S computed in the previous iteration.
Thus, for linear rules, replace: S := TR(S) - S by
S := TR(S) - S
Forn nonlinear rules the rewriting is more complex.
Non Linear Rules
ancs(X, Y) parent(X, Y).
ancs(X, Z) ancs(X,Y), ancs(Y,Z).
r: ancs(X, Z) ancs(X,Y), ancs(Y,Z).
r1:ancs(X, Z) ancs(X,Y), ancs(Y,Z).
r2:ancs(X, Z) ancs(X,Y), ancs(Y,Z).
Now, we can re-write r2 as:
r2,1:ancs(X, Z) ancs(X,Y), ancs(Y,Z).
r2,2:ancs(X, Z) ancs(X,Y), ancs(Y,Z).
Rule r2,2 produces only `old' values, and can be eliminated. We are left with rules r1 and r2,1:
ancs(X, Z) ancs(X,Y), ancs(Y,Z).
ancs(X, Z) ancs(X,Y), ancs(Y,Z).
Semivaive Fixpoint (cont.)
Analogy with symbolic differentiation Performance improvements: it is typically the case that n =
S << N = S S. The original ancs rule, for instance, requires the equijoin of
two relations of size N; after the differentiation we need to compute two equijoins, each joining a relation of size n with one of size N.
General Nonlinear Rules
A recursive rule of rank k is as follows:
r: Q0 c0, Q1, c1, Q2, Qk, ck
Is rewritten as follows:
r1: Q0 c0, Q1, c1, Q2, Qk, ck
r2: Q0 c0, Q1, c1, Q2, Qk, ck
rk:Q0 c0, Q1,
c1, Q2, Qk, ck
Thus the jth rule has the form:
rj:Q0 Q Qj Q
Iterated Fixpoint Computation for program P stratified in n strata
Let Pj, 1 j n denote the rules with their head in the j-th
stratum. Then, Mj be inductively constructed as follows:
1. M0 = and
2. Mj = Pj (Mj-1).
The naïve fixpoint algorithm remains the same, but M := Mj-1 and P is replaced by Pj
Theorem: Let P be a positive program stratified in n strata, and let Mn be the result produced by the iterated fixpoint
computation. Then, Mn = lfp(TP).
For programs with negated goals the computation by strata is necessary to produce the correct result (I.e., the Mn is the
stable model for P---not discussed here)
Bottom-Up versus Top-Down Computation
anc(X, Y) parent(X, Y). Compiled Rules anc(X, Z) anc(X,Y), parent(Y,Z).
parent(X, Y) father(X, Y). parent(X, Y) mother(X, Y).
mother(anne, silvia). Database mother(silvia, marc).
The differential fixpoint is computed in a bottom-up fashion. For a query ?anc(X, Y) this is optimal.
But many queries are such as ?anc(marc, Y) we want to propagate down the ‘marc’ constraint. Same for query forms: ?anc($X, Y), ?anc(X, $Y), or ?anc($X, $Y).
Specialization for Left-linear Recursive Rules
?anc(tom, Desc).
anc(Old, Young) parent(Old, Young).
anc(Old, Young) anc(Old, Mid), parent(Mid, Young)
This is changed into:
? anc(tom, Desc )
anc(Old/tom, Young) parent(Old/tom, Young).
anc(Old/tom, Young) anc(Old/tom, Mid), parent(Mid, Young).
Similar to the pushing selection inside recursion of query optimizers.
This works for left-linear rules with the query form: ?anc($Someone, Desc)
Right-linear rules
anc(Old, Young) parent(Old, Young).
anc(Old, Young) parent(Old, Mid), anc(Mid, Young).
Descendants of Tom: ? anc(TOM, X)
This query can no longer be implemented by specializing the program. Solution: turn the rules into equivalent left-recursive ones!
Symmetrically anc(X, $Y) cannot be supported into the above, to right-linear one above to which specialization applies.
The situation is symmetric. A query such as anc(X, $Y) cannot be supported on the left-linear version of the program. But the program can be transformed into the one above, to right-linear rules above to which specialization can apply.
For each left (right) linear rule there exists an equivalent right(left) linear program---similar tor regular grammars in PLs.
Deductive Database compilers do that.
The Magic Set Method
Specialization only works for left/right linear programs. It does not work in general, even for linear rules. The same generation example:
sg(A , A). sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who).
This program cannot be computed in a bottom-up fashion because the exit rule is not safe.
We can compute a “magic” set containing all the ancestors of marc and add them to the two rules.
Magic Sets fornon-recursive rules
Find the graduating seniors and their parents’ address:
spa(SN, PN, Paddr) senior(SN), parent(SN, PN),address(PN, Paddr).
senior(SN) student(SN, _, senior),graduating(SN).
To find the address of the parent named `Joe Doe’
?spa(SN, `Joe Doe’, Paddr)
Suppose that computing parent(X, $Y) is safe and not too expensive.
Magic Set Rewriting
spa_q(‘Joe Doe’).
m.senior(SN) spa_q(SN), parent(SN,PN).
senior(SN) m.senior(SN),student(SN, _, senior), graduating(SN).
The rest remains unchanged:spa(SN, PN, Paddr) senior(SN), parent(SN,PN),
address(PN,Paddr).
? spa(SN, `Joe Doe’, Paddr).
The Same Generation Example
sg(A , A). sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who).
This program cannot be computed in a bottom-up fashion because the exit rule is not safe.
We can compute a “magic” set containing all the ancestors of marc and add them to the two rules.
The magic set computation utilizes the bound arguments and goals in rules (blue).The first argument of sg is bound in the query. Thus X is bound and through goal parent(XP, X) the binding is passed to XP in the recursive goal. The variables Y and YP remain unbound
Magic Sets (Cont.)
Magic set rules: m.sg(marc). m.sg(XP) m.sg(X), parent(XP,X).
Transformed rules:
sg(X, X) m.sg(X).
sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y), m.sg(X).
Query: ?sg(marc, Who).
The rules for the magic predicates are built by using:
(1) the query constant as the exit rule (a fact).
(2) the bound arguments and predicates from the recursive rules---but the head and tail must be switched!
Recursive Methods
There are many other recursive methods, but the magic set is the most general and more widely use in deductive systems—including LDL++