Efficient Scheduling for Periodic Aggregation Queries in Multihop ...
Efficient Detection of Empty Result Queries
description
Transcript of Efficient Detection of Empty Result Queries
![Page 1: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/1.jpg)
32nd International Conference on Very Large Data BasesSeptember 12 - 15, 2006 Seoul, Korea
Efficient Detection of Empty Result Queries
Gang Luo IBM T.J. Watson Research Center
![Page 2: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/2.jpg)
2
Empty Result Problem
• Query returns an empty result set• User gets lost about where to look at next• Frequently encountered in interactive
exploration of massive data sets• Our contribution: method for quickly
detecting empty result sets
![Page 3: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/3.jpg)
3
Example Percentages of Empty Result Queries
• In a Customer Relationship Management (CRM) application developed by IBM– 18.07% (3,396 empty result queries in 18,793
queries)
• In a real estate application developed by IBM – 5.75%
• In a digital library application [JCM+00] – 10.53%
• In a bioinformatics application [RCP+98]– 38%
![Page 4: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/4.jpg)
4
Empty Result Queries May Not Finish Execution Quickly
• Consider a query joining two relations– Query execution time is longer than join time, no
matter whether or not query result set is empty
• Even if a query finishes in a few seconds in a lightly loaded RDBMS, it may last longer than one minute in a heavily loaded RDBMS
![Page 5: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/5.jpg)
5
Outline
• Limitations of previous approaches
• Fast detection method for empty result queries
• Some experiments
![Page 6: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/6.jpg)
6
Existing Solutions to the Empty Result Problem
• Explain what leads to the empty result set• Automatically generalize the query so that the
generalized query will return some answers
![Page 7: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/7.jpg)
7
Limitations of Existing Solutions
• Require domain specific knowledge• Only apply to a restricted form of queries• Require an excessive amount of time• Give too many reasons why the result set is
empty• Users cannot reuse each other’s query results
![Page 8: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/8.jpg)
8
Outline
• Limitations of previous approaches
• Fast detection method for empty result queries
• Some experiments
![Page 9: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/9.jpg)
9
Our Solution
• Only consider read-only environment• From previous queries’ execution, remember
the query parts that lead to empty result sets• When a new query Q comes, match it with the
remembered query parts. If such a match exists, report that Q will return an empty result set without executing Q
• Utilize special properties of empty result sets and thus often more powerful than traditional materialized view method
![Page 10: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/10.jpg)
10
Definitions
• Empty result propagating operator: An operator whose output is empty if any input is empty
• Empty result propagating query: A query whose query plan only contains empty result propagating operators (our focus)
• Query part: A sub-tree of a query plan• Atomic query part: An ordered pair (relation
names RN, selection condition SC)– Corresponds to a relational algebra formula: first
product join all relations in RN, then apply SC
– SC is a conjunction of primitive terms, where each primitive term is a comparison
![Page 11: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/11.jpg)
11
Definitions – Cont.
• Cover: Atomic query part P1=(RN1, SC1) covers atomic query part P2=(RN2, SC2) if
– RN1RN2
– Whenever SC2 is true, SC1 is true
• Property: Suppose atomic query part P1 covers atomic query part P2. For a given database, if the output of P1 is empty, the output of P2 is also empty.
![Page 12: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/12.jpg)
12
Given an Empty Result Query• Find the lowest-level query part P whose
output is empty
B (index-scan) B.e<40 B.e=50 [5000]
C (table-scan) [20000]
sort-merge join B.g=C.h [0]
C.f<300 [1000]
[0]
sort [0] sort [1000]
[0]
A (table-scan) [40000]
50<A.a<100 A.b=200 [200] [5000]
hash join A.c=B.d [0]
hash [200] hash [5000]
![Page 13: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/13.jpg)
13
Transforming P into a Simplified Query Part Ps
• Drop all operators (e.g., projection, hash, sort) that have no influence on the emptiness of the output
• Replace each physical join operator with a logical join operator
• Replace each index-scan operator with a table-scan operator followed by a selection operator, where the selection condition is the index-scan condition
![Page 14: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/14.jpg)
14
Transforming P into a Simplified Query Part Ps – Cont.
• Corresponding relational algebra formula– (50<A.a<100 A.b=200 (A)) ⋈A.c=B.d (B.e<40 B.e=50 (B))
B (table-scan) A (table-scan)
50<A.a<100 A.b=200B.e<40 B.e=50
⋈A.c=B.d
![Page 15: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/15.jpg)
15
Breaking Ps into Atomic Query Parts
• Get all selection conditions in the selection/join operators
• Rewrite the conjunction of these selection conditions into a disjunctive normal form (DNF) – Negations on numeric or string attributes are
removed using complementary operators– Interval-based comparison is treated as a single
primitive term • Generate a set of atomic query parts (RN, SC)
– RN: input relations of all table-scan operators in Ps – SC: a term in the DNF
![Page 16: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/16.jpg)
16
Breaking Ps into Atomic Query Parts – Cont.
• Property: The following three assertions are equivalent to each other:– The output of the query part P is empty
– The output of the simplified query part Ps is empty
– The output of each generated atomic query part is empty
(50<A.a<100 (A)) ⋈A.c=B.d (B.e<40 (B))
(A.b=200 (A)) ⋈A.c=B.d (B.e<40 (B))
(50<A.a<100 (A)) ⋈A.c=B.d (B.e=50 (B))
(A.b=200 (A)) ⋈A.c=B.d (B.e=50 (B))
![Page 17: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/17.jpg)
17
Storing the Generated Atomic Query Parts
• For each generated atomic query part Pa
– Insert Pa into a collection Caqp of atomic query parts
– Remove from Caqp all previously stored atomic query parts that are covered by Pa
• See paper for details of the coverage checking algorithm
![Page 18: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/18.jpg)
18
When Getting a New Query Q
• Break Q into a set of atomic query parts
• For each such atomic query part Pa, check whether some atomic query part Ai in Caqp covers Pa
• If such an Ai exists for each Pa, report that Q will return an empty result set without executing Q
![Page 19: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/19.jpg)
19
Outline
• Limitations of previous approaches
• Fast detection method for empty result queries
• Some experiments
![Page 20: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/20.jpg)
20
Setup
• Testing environment– PostgreSQL 7.3.4– Windows XP OS– Dell Inspiron 8500 PC with one 2.2GHz CPU,
512MB memory, one 40GB disk
• TPC-R benchmark• See paper for detection probability analysis
![Page 21: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/21.jpg)
21
Overhead Experiment
• Query Q1: Find the information about certain parts that were sold on certain days
select * from orders o, lineitem lwhere o.orderkey=l.orderkey and
(o.orderdate=d1 or … or o.orderdate=de) and (l.partkey=p1 or … or l.partkey=pf);
![Page 22: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/22.jpg)
22
Overhead Experiment – Cont.
• Query Q2: Find the information about certain parts that were sold to certain customers on certain days
select * from orders o, lineitem l, customer cwhere o.orderkey=l.orderkey and o.custkey=c.custkey and
(o.orderdate=d1 or … or o.orderdate=de) and (l.partkey=p1 or … or l.partkey=pf) and(c.nationkey=n1 or … or c.nationkey=ng);
![Page 23: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/23.jpg)
23
Overhead Experiment – Cont.• The overhead of our method increases with both query
complexity and the number of atomic query parts stored in Caqp • When check fails, the overhead of our method is higher than that
when check succeeds
0
0.002
0.004
0.006
0.008
1000 2000 3000
number of atomic query parts in Caqp
over
head
(sec
ond)
Q1, check succeedsQ1, check failsQ2, check succeedsQ2, check fails
![Page 24: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/24.jpg)
24
Overhead Experiment – Cont.• The overhead of our method is trivial compared to
query execution overhead
0.001
0.01
0.1
1
10
100
1000
1 2 3database size (GB)
exec
utio
n tim
e or
ove
rhea
d (s
econ
d)
execute Q1check Q1execute Q2check Q2
![Page 25: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/25.jpg)
25
Summary
• Provide a fast detection method for empty result queries– Low overhead– High detection probability once enough information
has been accumulated
![Page 26: Efficient Detection of Empty Result Queries](https://reader035.fdocuments.net/reader035/viewer/2022062309/5681584e550346895dc5a6a5/html5/thumbnails/26.jpg)
26
Open Issues
• In the presence of update, correctly preserve as much stored information as possible
• A hybrid method that can combine the advantages of both our method and the existing solutions
• More aggressive storage saving technique