Relaxing Join and Selection Queries
description
Transcript of Relaxing Join and Selection Queries
Relaxing Join and Selection Relaxing Join and Selection QueriesQueries
Rares VernicaRares VernicaUC Irvine, USAUC Irvine, USA
Joint work with Nick Koudas, Chen Li, and Anthony K. H. TungJoint work with Nick Koudas, Chen Li, and Anthony K. H. Tung
Rares Vernica, UC Irvine 2
Query ExampleQuery Example
SELECT * FROM Jobs J, Candidates CSELECT * FROM Jobs J, Candidates C
WHERE J.Salary <= 95WHERE J.Salary <= 95
AND J.Zipcode = C.ZipcodeAND J.Zipcode = C.Zipcode
AND C.WorkExp >= 5;AND C.WorkExp >= 5;
Jobs CandidatesID
Company
Zipcode
Salary
ID
Zipcode
ExpSalary
WorkExp
J1 Broadcom
92047 80 C1
93652 120 3
J2 Intel 93652 95 C2
92612 130 6
J3 Microsoft
82632 120 C3
82632 100 5
J4 IBM 90391 130 C4
90391 150 1
... … … … ... … … …
Rares Vernica, UC Irvine 3
What if the query answer is What if the query answer is empty?empty?
SELECT * FROM Jobs J, Candidates CSELECT * FROM Jobs J, Candidates C
WHERE J.Salary <= 95WHERE J.Salary <= 95
AND J.Zipcode = C.ZipcodeAND J.Zipcode = C.Zipcode
AND C.WorkExp >= 5;AND C.WorkExp >= 5;
Adjust the conditionsAdjust the conditions
What conditions to adjust?What conditions to adjust? How to adjust them?How to adjust them?
Rares Vernica, UC Irvine 4
Example Percentages of Empty Result Example Percentages of Empty Result QueriesQueries
• In a Customer Relationship Management (CRM) In a Customer Relationship Management (CRM) application developed by IBMapplication developed by IBM 18.07% (3,396 empty result queries in 18,793 queries)18.07% (3,396 empty result queries in 18,793 queries)
• In a real estate application developed by IBM In a real estate application developed by IBM 5.75% 5.75%
• In a digital library application [JCMIn a digital library application [JCM++00] 00] 10.53%10.53%
• In a bioinformatics application [RCPIn a bioinformatics application [RCP++98]98] 38%38%
Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006Research Center, USA) VLDB 2006
Rares Vernica, UC Irvine 5
ObservationsObservations
Jobs CandidatesID
Company
Zipcode
Salary
ID
Zipcode
ExpSalary
WorkExp
J1 Broadcom
92047 80 C1
93652 120 3
J2 Intel 93652 95 C2
92612 130 6
J3 Microsoft
82632 120 C3
82632 100 5
J4 IBM 90391 130 C4
90391 150 1
... … … … ... … … …
Different ways to adjust the conditions: Different ways to adjust the conditions: Select vs. Join Select vs. Join
How much to adjust each condition?How much to adjust each condition?Salary <= 100 vs. Salary <= 120Salary <= 100 vs. Salary <= 120
Adjust join vs. Adjust both selectionsAdjust join vs. Adjust both selections
Salary <= 95
WorkExp >= 5
Rares Vernica, UC Irvine 6
ContributionsContributions
Query relaxationQuery relaxation framework for selections framework for selections and joinsand joins
LatticeLattice-based approach for query relaxation-based approach for query relaxation
Efficient relaxation Efficient relaxation algorithmsalgorithms
Rares Vernica, UC Irvine 7
OverviewOverview
1.1. MotivationMotivation
2.2. Query RelaxationQuery Relaxation
3.3. Lattice-based RelaxationLattice-based Relaxation
4.4. Relaxation AlgorithmsRelaxation Algorithms
5.5. VariationsVariations
6.6. ExperimentsExperiments
Rares Vernica, UC Irvine 8
Query RelaxationQuery Relaxation
Top-k / Nearest neighborTop-k / Nearest neighborWeight for each conditionWeight for each condition
SkylineSkylineNo weights are neededNo weights are neededConditions are not considered equalConditions are not considered equalReturn non dominated pointsReturn non dominated points
Rares Vernica, UC Irvine 9
Query RelaxationQuery Relaxation
J .Salary
C.WorkExp
J .Salary <= 95C.WorkExp >=5
5
95
SkylineSkyline
Stephan Börzsönyi, Donald Kossmann, Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. Konrad Stocker: The Skyline Operator. ICDE 2001ICDE 2001
Rares Vernica, UC Irvine 10
OverviewOverview
1.1. MotivationMotivation
2.2. Query RelaxationQuery Relaxation
3.3. Lattice-based RelaxationLattice-based Relaxation
4.4. Relaxation AlgorithmsRelaxation Algorithms
5.5. VariationsVariations
6.6. ExperimentsExperiments
Rares Vernica, UC Irvine 11
LatticeLattice-based Relaxation-based Relaxation
Jobs CandidatesID
Company
Zipcode
Salary
ID
Zipcode
ExpSalary
WorkExp
J1 Broadcom
92047 80 C1
93652 120 3
J2 Intel 93652 95 C2
92612 130 6
J3 Microsoft
82632 120 C3
82632 100 5
J4 IBM 90391 130 C4
90391 150 1
... … … … ... … … …
R
RJS
f
J S
JSRSRJ
Salary <= 95
WorkExp >= 5
R – select on JobsR – select on JobsJ – join conditionJ – join conditionS – select on S – select on CandidatesCandidates
Rares Vernica, UC Irvine 12
OverviewOverview
1.1. Motivation Motivation
2.2. Query RelaxationQuery Relaxation
3.3. Lattice-based RelaxationLattice-based Relaxation
4.4. Relaxation AlgorithmsRelaxation Algorithms
5.5. VariationsVariations
6.6. ExperimentsExperiments
Rares Vernica, UC Irvine 13
Relaxing Selection ConditionsRelaxing Selection Conditions
Jobs CandidatesID
Company
Zipcode
Salary
ID
Zipcode
ExpSalary
WorkExp
J1 Broadcom
92047 80 C1
93652 120 3
J2 Intel 93652 95 C2
92612 130 6
J3 Microsoft
82632 120 C3
82632 100 5
J4 IBM 90391 130 C4
90391 150 1
... … … … ... … … …
Algorithm:Algorithm:
1.1. Compute Compute SkylineSkyline on Jobs on Jobs
2.2. Compute Compute SkylineSkyline on on CandidatesCandidates
3.3. Join the SkylinesJoin the Skylines
Salary <= 95
WorkExp >= 5
INCORRECTINCORRECT
SkylineSkyline
SkylineSkyline
Empty JoinEmpty Join
SkylineSkyline
R
RJS
f
J S
JSRSRJ
Rares Vernica, UC Irvine 14
Relaxing Selection ConditionsRelaxing Selection Conditions
Jobs CandidatesID
Company
Zipcode
Salary
ID
Zipcode
ExpSalary
WorkExp
J1 Broadcom
92047 80 C1
93652 120 3
J2 Intel 93652 95 C2
92612 130 6
J3 Microsoft
82632 120 C3
82632 100 5
J4 IBM 90391 130 C4
90391 150 1
... … … … ... … … …
Join FirstJoin First Algorithm: Algorithm:
1.1. Compute the joinCompute the join(disregarding the selections)(disregarding the selections)
2.2. Compute Compute SkylineSkyline on join results on join results
Salary <= 95
WorkExp >= 5JoinJoin
SkylineSkyline
R
RJS
f
J S
JSRSRJ
Rares Vernica, UC Irvine 15
Relaxing Selection ConditionRelaxing Selection ConditionVariationsVariations
Pruning JoinPruning JoinBuild the Skyline during the joinBuild the Skyline during the join
Pruning Join+Pruning Join+Pruning JoinPruning JoinBuild the local Skyline before the joinBuild the local Skyline before the join
Sorted Access JoinSorted Access JoinFagin’s Top-k: sort the columns on Fagin’s Top-k: sort the columns on
relaxationrelaxationCompute the join SkylineCompute the join Skyline
R
RJS
f
J S
JSRSRJ
Rares Vernica, UC Irvine 16
Relaxing all conditionsRelaxing all conditions
82632 - 93652 80 - 130
120 - 13082632 - 90391
12082632 13090391 ......
92047 - 93652 80 - 95
82632 - 93652 1 - 6
1 - 582632 - 90391
582632 190391 ......
92612 - 93652 3 - 6
Multi-Dim.-Index-based-RelaxationMulti-Dim.-Index-based-Relaxation Algorithm: Algorithm:
1.1. Traverse the index structure Traverse the index structure top-downtop-down
2.2. Form pairs of nodes or recordsForm pairs of nodes or records
3.3. Build the Build the SkylineSkyline
12082632
582632
13090391
190391
......
......Skyline
Queue
82632 - 93652 80 - 130
82632 - 93652 1 - 6
120 - 13082632 - 90391 120 - 13082632 - 90391
1 - 582632 - 90391 92612 - 93652 3 - 6
......
......
R
RJS
f
J S
JSRSRJ
Rares Vernica, UC Irvine 17
OverviewOverview
1.1. MotivationMotivation
2.2. Query RelaxationQuery Relaxation
3.3. Lattice-based RelaxationLattice-based Relaxation
4.4. Relaxation AlgorithmsRelaxation Algorithms
5.5. VariationsVariations
6.6. ExperimentsExperiments
Rares Vernica, UC Irvine 18
VariationsVariations
Computing Computing Top-kTop-k over Skyline over SkylineWeight to each conditionWeight to each condition
Queries with Queries with multiple joinsmultiple joins
Conditions on Conditions on nonnumeric attributesnonnumeric attributesDominance checking functionDominance checking function
J .Salary
C.WorkExp
J .Salary <= 95C.WorkExp >=5
5
95
Top 2
Rares Vernica, UC Irvine 19
OverviewOverview
1.1. MotivationMotivation
2.2. Query RelaxationQuery Relaxation
3.3. Lattice-based RelaxationLattice-based Relaxation
4.4. Relaxation AlgorithmsRelaxation Algorithms
5.5. VariationsVariations
6.6. ExperimentsExperiments
Rares Vernica, UC Irvine 20
Experimental SettingExperimental Setting
DatasetsDatasets RealReal
1.1. Internet Movie Database (IMDB)Internet Movie Database (IMDB)Movies (120k) & ActorInMovies (1.2m)Movies (120k) & ActorInMovies (1.2m)
2.2. Census-Income – UCI KDD RepositoryCensus-Income – UCI KDD RepositoryCensus (200k)Census (200k)
SyntheticSyntheticIndependent, Correlated, and AnticorrelatedIndependent, Correlated, and Anticorrelated
ImplementationImplementation GNU C++GNU C++ Spatial Index Library (R-tree)Spatial Index Library (R-tree) Linux, AMD Opteron 240, 1GB RAMLinux, AMD Opteron 240, 1GB RAM
Rares Vernica, UC Irvine 21
IMDB Dataset
Different algorithms, different Different algorithms, different behaviorsbehaviors
Rares Vernica, UC Irvine 22
Correlated Dataset
Different datasets, different Different datasets, different behaviorsbehaviors
Anticorrelated Dataset
Independent Dataset
Rares Vernica, UC Irvine 23
How big is the Skyline?How big is the Skyline?
Rares Vernica, UC Irvine 24
Relaxing join takes timeRelaxing join takes time
Self-join on Census Dataset
Rares Vernica, UC Irvine 25
Top-k over SkylineTop-k over Skyline
IMDB Dataset
Rares Vernica, UC Irvine 26
Related WorkRelated Work
Muslea et al.Muslea et al.Alternate forms of conjunctive expressionsAlternate forms of conjunctive expressions
Efficient Skyline algorithmsEfficient Skyline algorithmsSelection queriesSelection queries
Efficient Top-k algorithmsEfficient Top-k algorithmsRequire weights for conditionsRequire weights for conditions
Rares Vernica, UC Irvine 27
ConclusionsConclusions
Query relaxationQuery relaxation framework for selections framework for selections and joinsand joins
LatticeLattice-based approach for query relaxation-based approach for query relaxation
Efficient relaxation Efficient relaxation algorithmsalgorithms
Rares Vernica, UC Irvine 28
Future WorkFuture Work
OptimumOptimum use of the lattice structure use of the lattice structure
Relax conditions on Relax conditions on string attributesstring attributes
Algorithms applicableAlgorithms applicable outside the outside the databases databases
Questions ?Questions ?
Rares Vernica, UC Irvine 30
Rares Vernica, UC Irvine 31
Skyline vs. Top-kSkyline vs. Top-k
J .Salary
C.WorkExp
J .Salary <= 95C.WorkExp >=5
5
95
Top 2
J .Salary
C.WorkExp
J .Salary <= 95C.WorkExp >=5
5
95
Rares Vernica, UC Irvine 32
Skyline vs. Top-k over SkylineSkyline vs. Top-k over Skyline
J .Salary
C.WorkExp
J .Salary <= 95C.WorkExp >=5
5
95
J .Salary
C.WorkExp
J .Salary <= 95C.WorkExp >=5
5
95
Top 2