Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases
description
Transcript of Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases
![Page 1: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/1.jpg)
Fall 2009, Advanced Constraint Programming
1
Symmetry Detection in Constraint Satisfaction Problems
& Its Application in Databases
Berthe Y. ChoueiryConstraint Systems Laboratory
Department of Computer Science & EngineeringUniversity of Nebraska-Lincoln
Joint work with Amy Beckwith-Davis, Anagh Lal, and Eugene C. Freuder
Supported by NSF CAREER award #0133568
![Page 2: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/2.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 2
Outline
• Definitions– CSP– Interchangeability– Bundling
• Bundling in CSPs
• Bundling for join query computation
• Conclusions
![Page 3: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/3.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 3
Constraint Satisfaction Problem (CSP)
• Given P = (V, D, C)– V : set of variables– D : set of their domains– C : set of constraints (relations) restricting the
acceptable combination of values for variables– Solution is a consistent assignment of values to variables
• Query: find 1 solution, all solutions, etc.• Examples: SAT, scheduling, product configuration• NP-Complete in general
V3
{d}
{a, b, d} {a, b, c}
{c, d, e, f}
V4
V2V1
![Page 4: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/4.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 4
Backtrack search
• DFS + backtracking (linear space) – Variable being instantiated: current variable– Un-instantiated variables: future variables– Instantiated variables: past variables
• + Constraint propagation – Backtrack search with forward checking (FC)
c e f d
dV1
V2
S
V3
Solution
V1 dV2 e
V3 aV4 c
{c,d,e,f}
{a,b,d}
{a,b,c}
V1
V2
V3
V4
d
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
![Page 5: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/5.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 5
Interchangeability [Freuder, 91]
• Captures the idea of symmetry between solutions • Functional interchangeability
– Any mapping between two solutions– Including permutation of values across variables, equivalent to
graph isomorphism
• Full interchangeability (FI)– Restricted to values of a single variable– Also, likely intractable
V1 V2 {d, e, f}
V3 V4
In every solutionV1 dV2 c
V3 aV4 b
V1 dV2 c
V3 bV4 a
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
![Page 6: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/6.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 6
Value interchangeability [Freuder, 91]
• Full Interchangeability (FI): – d, e, f interchangeable for V2 in any solution
• Neighborhood Interchangeability (NI): – Considers only the neighborhood of the variable – Finds e, f but misses d– Efficiently approximates FI– Discrimination tree DT(V2)
{c, d, e, f }{d}
{a, b, d} {a, b, c} V4
V2V1
V3
![Page 7: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/7.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 7
Outline
• Definitions
• Bundling in CSPs– Static bundling– Dynamic bundling– Dynamic bundling for non-binary CSPs
• Bundling for join query computation
• Conclusions
![Page 8: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/8.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 8
• Static bundling
[Haselböck, 93]
– Before search: compute & store NI sets
– During search: • Future variables: remove bundle of equivalent values • Current variable: assign a bundle of equivalent values
• Advantages– Reduces search space
– Creates bundled solutions
Bundling: using NI in search
Static bundling
c e, f d
dV1
V2
S
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
V4 {b,c}
V1 dV2 {e,f}
V3 aV2
{ c, d, e, f }
{ d, c, e, f }
{ c, d, e, f }
V3
V4
V1
![Page 9: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/9.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 9
<V4,a> <V3,d><V4,a>
<V4,b><V4,c><V4,b>
<V3,b><V3,a>
• Dynamically identifies NI• Using discrimination tree for forward checking:
– is never less efficient than BT & static bundling
Dynamic bundling (DynBndl) [2001]
Static bundling
S
c d, e, f
dV1
V2
Dynamic bundling
c e, f d
dV1
V2
S
V3
{d}
{a, b, d} {a, b, c}
{ c, d, e, f }
V4
V2V1
V2,{d,e,f} V2,{c}
![Page 10: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/10.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 10
Non-binary CSPs
Constraint Variable
C1 C2 C3 C4
V V1 V2 V V3 V2 V3 V4 V1 V4
1 1 3 1 3 1 2 1 1 1
1 3 3 2 3 1 2 2 2 2
2 1 3 3 2 2 2 1 3 1
2 3 3 4 2 2 2 2
3 1 1 4 2 3 1 1
3 2 2 6 1
4 1 1
4 2 2
5 3 2
6 3 2
C4
{1, 2, 3, 4, 5, 6}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
C2
C1
C3 V1
V2
V3
V4
V
• Scope(Cx): the set of variables involved in Cx
• Arity(Cx): size of scope
Computing NI for non-binary CSPs is not a trivial extension from binary CSPs
![Page 11: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/11.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 11
1. Building an nb-DT for each constraint– Determines the NI sets of variable given constraint
2. Intersecting partitions from nb-DTs – Yields NI sets of V (partition of DV)
3. Processing paths in nb-DTs– Gives, for free, updates necessary for forward checking
NI for non-binary CSPs [2003,2005]
C4
{1, 2, 3, 4, 5, 6}
C2
C1
C3
V1
V2
V3
V4
V
{1, 2} {5, 6} {3, 4}
Root
nb-DT(V, C1)
Root
{1, 2} {3, 4}{6}
nb-DT(V, C2)
{5}{1, 2} {3, 4} {6}
{5}
![Page 12: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/12.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 12
Robust solutions
• Solution bundle– Cartesian product of domain bundles– Compact representation– Robust solutions
• Dynamic bundling finds larger bundles
V1 dV2 e
V3 aV4 c
Single Solution
V1 dV2 {e,f}
V3 aV4 {b,c}
Static bundling
V1 dV2 {d,e,f}
V3 aV4 {b,c}
Dynamic bundling
![Page 13: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/13.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 13
DynBndl: worth the effort?
• Finds larger bundles• Enables forward checking at no extra cost• Does not cost more than BT or static bundling
– Cost model: • # nodes visited by search• # constraint checks made
− Theoretical guarantee holds • for finding all solutions• under same variable ordering
¿ Finding first solution ?− Experiments uncover an unexpected benefit
![Page 14: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/14.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 14
Bundling of no-goods…
No-good bundle
{1, 2}
{1, 3}
{3}
{3, 4}
{2}
{1}
{1}
{1}
V
V4
V3
V1
V2
Solution bundle
C4
{1, 2, 3, 4, 5, 6}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
{1, 2, 3}
C2
C1
C3 V1
V2
V3
V4
V
• … is particularly effective
![Page 15: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/15.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 15
• CSP parameters: – n: number of variables {20,30}– a: domain size {10,15}– t: constraint tightness [25%, 75%]– CR: constraint ratio (arity: 2, 3, 4)– 1,000 instances per tightness value
• Phase transition• Performance measures
– Nodes visited (NV)– Constraint checks (CC)– CPU time– First Bundle Size (FBS)
Experimental set-up
Cos
t of
sol
vin
g
Mostly solvable instances
Mostly un-solvable instances
Critical value Order parameter
![Page 16: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/16.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 16
Empirical evaluations
• DynBndl versus FC (BT + forward checking)
• Randomly generated problems, Model B• Experiments
– Effect of varying tightness– In the phase-transition region
• Effect of varying domain size • Effect of varying constraint ratio (CR)
• ANOVA to statistically compare performance of DynBndl and FC with varying t
• t-distribution for confidence intervals
![Page 17: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/17.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 17
Analysis: Varying tightness• Low tightness
– Large FBS • 33 at t=0.35 • 2254 (Dataset #13, t=0.35)
– Small additional cost
• Phase transition– Multiple solutions present– Maximum no-good bundling
causes max savings in CPU time, NV, & CC
• High tightness– Problems mostly unsolvable– Overhead of bundling minimal
n=20a=15CR=CR3
0
2
4
6
8
10
12
14
16
18
20
0.325 0.35 0.375 0.4 0.425 0.45 0.475 0.5 0.525 0.55 0.575 0.6
TightnessT
ime
[s
ec
]#
NV
, h
un
dre
ds t FBS
0.350 33.44 0.400 10.91 0.425 7.130.437 6.38 0.450 5.620.462 2.370.475 0.660.500 0.03
0.550 0.00 NV
CPU time
DynBndl
FC
DynBndl
FC
![Page 18: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/18.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 18
Analysis: Varying domain size• Increasing a in phase-
transition– FBS increases: More
chances for symmetry– CPU time decreases:
more bundling of no-goods
CR Improv (CPU) %
FBS
a=10 a=15 a=10 a=15
CR1 33.3 34.3 5.5 11.9
CR2 28.6 33.0 5.0 5.5
CR3 29.8 31.7 3.6 5.0
CR4 28.4 31.6 1.2 1.4
Increasing a (n=30)
Because the benefits of DynBndl increase with increasing domain size, DynBndl is particularly interesting for database applications where large domains are typical
![Page 19: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/19.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 19
Outline
• Definitions
• Bundling in CSPs
• Bundling for join query computation– Idea– A CSP model for the query join– Sorting-based bundling algorithm– Dynamic-bundling-based join algorithm
• Conclusions
![Page 20: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/20.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 20
The join queryJoin query
SELECT R2.A,R2.B,R2.C
FROM R1,R2
WHERE R1.A=R2.A
AND R1.B=R2.B
AND R1.C=R2.C
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
3 16 30
4 10 25
5 12 23
5 13 23
5 14 23
6 13 27
6 14 27
7 14 28
R2
A B C
1 12 23
1 13 23
1 14 23
1 15 23
2 10 25
3 17 20
4 10 25
5 12 23
5 13 23
5 14 23
5 15 23
6 13 27
6 14 27
Result: 10 tuples in
3 nested tuples
R1 R2 (compacted)
A B C
{1, 5} {12, 13, 14} {23}
{2, 4} {10} {25}
{6} {13, 14} {27}
![Page 21: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/21.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 21
Databases & CSPs
DB terminology CSP terminology
Table, relation Constraint (relational constraint)
Join condition Constraint (join-condition constraint)
Attribute CSP variable
Tuple in a table Tuple in a constraint or allowed by one
Computing a join sequence Finding all solutions to a CSP
• Same computational problems, different cost models– Databases: minimize # I/O operations– CSP community: # CPU operations
• Challenges for using CSP techniques in DB– Use of lighter data structures to minimize memory usage– Fit in the iterator model of database engines
![Page 22: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/22.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 22
Modeling join query as a CSP
• Attributes of relations CSP variables• Attribute values variable domains• Relations relational constraints• Join conditions join-condition constraintsSELECT R1.A,R1.B,R1.C
FROM R1,R2
WHERE R1.A=R2.A
AND R1.B=R2.B
AND R1.C=R2.C
R1.A R1.B R1.C
R2.A R2.BR2.C
R1 R2
![Page 23: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/23.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 23
Join operator
• R1 xy R2– Most expensive operator in terms of I/O is “=” Equi-Join
• x is same as y Natural Join
• Join algorithms– Nested Loop– Sorting-based
• Sort-Merge, Progressive Merge-Join (PMJ)• Partitions relations by sorting, minimizes # scans of relations
– Hashing-based
![Page 24: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/24.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 24
Join query• R1 xy R2
– Most expensive operator in terms of I/O is “=” Equi-Join
• x is same as y Natural Join
• CSP model– Attributes of relations CSP variables– Attribute values variable domains– Relations relational constraints– Join conditions join-condition constraints
SELECT R1.A,R1.B,R1.CFROM R1,R2WHERE R1.A=R2.A AND R1.B=R2.BAND R1.C=R2.C
R1.A R1.B R1.C
R2.A R2.BR2.C
R1 R2
![Page 25: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/25.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 25
Progressive Merge Join
• PMJ: a sort-merge algorithm
[Dittrich et al. 03]
• Two phases1. Sorting: sorts sub-sets of relations &
2. Merging phase: merges sorted sub-sets
• PMJ produces early results • We use the framework of the PMJ
![Page 26: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/26.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 26
New join algorithm
• Sorting & merging phases– Load sub-sets of relations in memory– Compute in-memory join using dynamic
bundling• Uses sorting-based bundling (shown next)• Computes join of in-memory relations using
dynamically computed bundles
![Page 27: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/27.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 27
• Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible
R1.A
R2.A
R1.B
R2.B
R1.C
R2.C
R1
R2
• Sort relations using above ordering
• Next: Compute bundles of variable ahead in variable ordering (R1.A)
Sorting-based bundling
![Page 28: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/28.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 28
Computing a bundle of R1.A
Partition
Unequalpartitions
Symmetricpartitions
Bundle {1, 5}
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
5 12 23
5 13 23
5 14 23
• Partition of a constraint–Tuples of the relation having the same value of R1.A
• Compare projected tuples of first partition with those of another partition
• Compare with every other partition to get complete bundle
![Page 29: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/29.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 29
R1
A B C
1 12 23
1 13 23
1 14 23
2 10 25
3 16 30
3 16 24
R2
A B C
1 12 23
1 13 23
1 14 23
1 15 23
2 10 25
3 17 20
{1, 5, x}{1, 5, y, z}
Common {1, 5}1. Compute a bundle
for the attribute 2. Check bundle
validity with future constraints
3. If no common value ‘backtrack’
Assign variable with the surviving values in the bundle
Finding the valid bundle
![Page 30: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/30.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 30
Experiments
• XXL library for implementation & evaluation• Data sets
• Random: 2 relations R1, R2 with same schema as example– Each relation: 10,000 tuples– Memory size: 4,000 tuples– Page size 200 tuples
• Real-world problem: 3 relations, 4 attributes
• Compaction rate achieved– Random problem: 1.48
– Savings even with (very) preliminary implementation
– Real-world problem: 2.26 (69 tuples in 32 nested tuples)
![Page 31: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/31.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 31
Outline
• Definitions
• Bundling in CSPs
• Bundling for join query computation
• Conclusions– Summary– Future research
![Page 32: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/32.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 32
Summary
• Dynamic bundling in finite CSPs – Binary and non-binary constraints
– Produces multiple robust solutions
– Significantly reduces cost of search at phase transition
• Application to join-query computation
Constraint Processing inspires innovative solutions to fundamental difficult problems in Databases
![Page 33: Symmetry Detection in Constraint Satisfaction Problems & Its Application in Databases](https://reader035.fdocuments.net/reader035/viewer/2022062809/56815940550346895dc68083/html5/thumbnails/33.jpg)
Fall 2009, Advanced Constraint Programming
December 9, 2005 33
Future research
• CSPs– Only scratched the surface: – interchangeability + decomposition [ECAI 1996],– partial interchangeability [AAAI 1998], – tractable structures
• Databases– Investigate benefit of bundling
• Sampling operator• Main-memory databases• Automatic categorization of query results
• Constraint databases– Design bundling mechanisms for gap & linear constraints over
intervals (spatial databases)