CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

CS240A: Databases and Knowledge Bases

From Differential Fixpoints to Magic Sets

Carlo Zaniolo

Department of Computer Science

University of California, Los Angeles

January, 2002

Notes From Chapter 9 of Advanced Database Systems by Zaniolo, Ceri, Faloutsos, Snodgrass, Subrahmanian and Zicari Morgan Kaufmann, 1997

Recursive Predicates

r1: anc(X, Y) parent(X, Y).

r2: anc(X, Z) anc(X,Y), parent(Y,Z).

r2 is a recursive rule---a left linear one

r1 is the a nonrecursive rule defining a recursive

predicate—this is called an exit rule.

An alternative definition for anc:

r3: anc(X, Y) parent(X, Y).

r4: anc(X, Z) anc(X,Y), anc(Y,Z).

Here r4 is a quadratic rule.

Fixpoint Computation

The inflationary immediate consequence operator for P:

P (I) = TP (I) I

We have: P

n () = TPn ()

lfp(TP) = TP () = lfp(P) = P

()

Fixpoint Computation (cont.)

Naïve Fixpoint Algorithm for P (M = for now

{S : = M ; S: = P(M)

while S S{ S : = S;

S: = P(S) } }

We can replace the first P with E and the second

one with R respectively denoting the immediate

consequence operators for the exit rules and the recursive ones.

Differential Fixpoint (a.k.a. Seminaive Computation)

Redundant Computation: the jth iteration step also re-computes all atoms obtained in the (j – 1)th step. Finite differences techniques tracing the derivations over two steps:

1. S the set of atoms obtained up to step j-1

2. S’ the set of atoms obtained up to step j

3. S = R (S) - S = TR (S) - S denotes the new atoms at

step j (i.e., the atoms that were not in S at step j-1)

4. S = R (S) - S = TR (S) - S are the new atoms

obtained at step j+1.

Differential Fixpoint Algorithm(M = for now

{S := M; S := TE(M);

S:= S S; while S

{ S := TR(S) - S;

S := S ; S := S ;

S:= S S } }

anc, anc, and anc, respectively, denote ancestor atoms that are in S, S, and S = S S.

Rule Differentiation

To compute S: = TR ( S) - S we can use a TR defined by

the following rule:

anc(X, Z) anc(X,Y), parent(Y,Z). This can be rewritten as:

anc(X, Z) anc(X,Y), parent(Y,Z). anc(X, Z) anc(X,Y), parent(Y,Z).

The second rule can now be eliminated, since it produces only atoms that were already contained in anc, i.e., in the S computed in the previous iteration.

Thus, for linear rules, replace: S := TR(S) - S by

S := TR(S) - S

Forn nonlinear rules the rewriting is more complex.

Non Linear Rules

ancs(X, Y) parent(X, Y).

ancs(X, Z) ancs(X,Y), ancs(Y,Z).

r: ancs(X, Z) ancs(X,Y), ancs(Y,Z).

r1:ancs(X, Z) ancs(X,Y), ancs(Y,Z).

r2:ancs(X, Z) ancs(X,Y), ancs(Y,Z).

Now, we can re-write r2 as:

r2,1:ancs(X, Z) ancs(X,Y), ancs(Y,Z).

r2,2:ancs(X, Z) ancs(X,Y), ancs(Y,Z).

Rule r2,2 produces only `old' values, and can be eliminated. We are left with rules r1 and r2,1:



Semivaive Fixpoint (cont.)

Analogy with symbolic differentiation Performance improvements: it is typically the case that n =

S << N = S S. The original ancs rule, for instance, requires the equijoin of

two relations of size N; after the differentiation we need to compute two equijoins, each joining a relation of size n with one of size N.

General Nonlinear Rules

A recursive rule of rank k is as follows:

r: Q0 c0, Q1, c1, Q2, Qk, ck

Is rewritten as follows:

r1: Q0 c0, Q1, c1, Q2, Qk, ck

r2: Q0 c0, Q1, c1, Q2, Qk, ck

rk:Q0 c0, Q1,

c1, Q2, Qk, ck

Thus the jth rule has the form:

rj:Q0 Q Qj Q

Iterated Fixpoint Computation for program P stratified in n strata

Let Pj, 1 j n denote the rules with their head in the j-th

stratum. Then, Mj be inductively constructed as follows:

1. M0 = and

2. Mj = Pj (Mj-1).

The naïve fixpoint algorithm remains the same, but M := Mj-1 and P is replaced by Pj

Theorem: Let P be a positive program stratified in n strata, and let Mn be the result produced by the iterated fixpoint

computation. Then, Mn = lfp(TP).

For programs with negated goals the computation by strata is necessary to produce the correct result (I.e., the Mn is the

stable model for P---not discussed here)

Bottom-Up versus Top-Down Computation

anc(X, Y) parent(X, Y). Compiled Rules anc(X, Z) anc(X,Y), parent(Y,Z).

parent(X, Y) father(X, Y). parent(X, Y) mother(X, Y).

mother(anne, silvia). Database mother(silvia, marc).

The differential fixpoint is computed in a bottom-up fashion. For a query ?anc(X, Y) this is optimal.

But many queries are such as ?anc(marc, Y) we want to propagate down the ‘marc’ constraint. Same for query forms: ?anc($X, Y), ?anc(X, $Y), or ?anc($X, $Y).

Specialization for Left-linear Recursive Rules

?anc(tom, Desc).

anc(Old, Young) parent(Old, Young).

anc(Old, Young) anc(Old, Mid), parent(Mid, Young)

This is changed into:

? anc(tom, Desc )

anc(Old/tom, Young) parent(Old/tom, Young).

anc(Old/tom, Young) anc(Old/tom, Mid), parent(Mid, Young).

Similar to the pushing selection inside recursion of query optimizers.

This works for left-linear rules with the query form: ?anc($Someone, Desc)

Right-linear rules

anc(Old, Young) parent(Old, Young).

anc(Old, Young) parent(Old, Mid), anc(Mid, Young).

Descendants of Tom: ? anc(TOM, X)

This query can no longer be implemented by specializing the program. Solution: turn the rules into equivalent left-recursive ones!

Symmetrically anc(X, $Y) cannot be supported into the above, to right-linear one above to which specialization applies.

The situation is symmetric. A query such as anc(X, $Y) cannot be supported on the left-linear version of the program. But the program can be transformed into the one above, to right-linear rules above to which specialization can apply.

For each left (right) linear rule there exists an equivalent right(left) linear program---similar tor regular grammars in PLs.

Deductive Database compilers do that.

The Magic Set Method

Specialization only works for left/right linear programs. It does not work in general, even for linear rules. The same generation example:

sg(A , A). sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who).

This program cannot be computed in a bottom-up fashion because the exit rule is not safe.

We can compute a “magic” set containing all the ancestors of marc and add them to the two rules.

Magic Sets fornon-recursive rules

Find the graduating seniors and their parents’ address:

spa(SN, PN, Paddr) senior(SN), parent(SN, PN),address(PN, Paddr).

senior(SN) student(SN, _, senior),graduating(SN).

To find the address of the parent named `Joe Doe’

?spa(SN, `Joe Doe’, Paddr)

Suppose that computing parent(X, $Y) is safe and not too expensive.

Magic Set Rewriting

spa_q(‘Joe Doe’).

m.senior(SN) spa_q(SN), parent(SN,PN).

senior(SN) m.senior(SN),student(SN, _, senior), graduating(SN).

The rest remains unchanged:spa(SN, PN, Paddr) senior(SN), parent(SN,PN),

address(PN,Paddr).

? spa(SN, `Joe Doe’, Paddr).

The Same Generation Example

sg(A , A). sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y). ?sg(marc, Who).

This program cannot be computed in a bottom-up fashion because the exit rule is not safe.

We can compute a “magic” set containing all the ancestors of marc and add them to the two rules.

The magic set computation utilizes the bound arguments and goals in rules (blue).The first argument of sg is bound in the query. Thus X is bound and through goal parent(XP, X) the binding is passed to XP in the recursive goal. The variables Y and YP remain unbound

Magic Sets (Cont.)

Magic set rules: m.sg(marc). m.sg(XP) m.sg(X), parent(XP,X).

Transformed rules:

sg(X, X) m.sg(X).

sg(X, Y) parent(XP,X), sg(XP,YP), parent(YP,Y), m.sg(X).

Query: ?sg(marc, Who).

The rules for the magic predicates are built by using:

(1) the query constant as the exit rule (a fact).

(2) the bound arguments and predicates from the recursive rules---but the head and tail must be switched!

Recursive Methods

There are many other recursive methods, but the magic set is the most general and more widely use in deductive systems—including LDL++

CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets

Documents

Transcript of CS240A: Databases and Knowledge Bases From Differential Fixpoints to Magic Sets