Web image re ranking using query-specific semantic signatures
Depth Estimation for Ranking Query Optimization
description
Transcript of Depth Estimation for Ranking Query Optimization
![Page 1: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/1.jpg)
Depth Estimation forRanking Query Optimization
Karl Schnaitter, UC Santa CruzJoshua Spiegel, BEA Systems, Inc.Neoklis Polyzotis, UC Santa Cruz
![Page 2: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/2.jpg)
2
Relational Ranking Queries
• A base score for each table in [0,1]
• Combined with a scoring function SS(bH, bR, bE) = 0.3*bH + 0.5*bR + 0.2*bE
• Return top k results based on SIn this case, k = 10
RANK BY 0.3/h.price + 0.5*r.rating + 0.2*isMusic(e)LIMIT 10
Hotels: bH(h) = 1/h.price
Restaurants: bR(r) = r.rating
Events: bE(e) = isMusic(e)
SELECT h.hid, r.rid, e.eidFROM Hotels h, Restaurants r, Events eWHERE h.city = r.city AND r.city = e.city
![Page 3: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/3.jpg)
3
Ranking Query ExecutionSELECT h.hid, r.rid, e.eidFROM Hotels h, Restaurants r, Events eWHERE h.city = r.city AND r.city = e.cityRANK BY 0.3/h.price + 0.5*r.rating + 0.2*isMusic(e)LIMIT 10
H R
E
Fetch 10 results
H R
E
Sort on S
conventional plan rank-aware plan
Fetch 10 results
Ordered by score
Rank join
Rank joinJoin
Join
![Page 4: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/4.jpg)
4
Depth Estimation
• Depth: number of accessed tuples– Indicates execution cost– Linked to memory consumption
• The problem: Estimate depths for each operator in a rank-aware plan
H Rleft depth right depth
Rank join
![Page 5: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/5.jpg)
5
Depth Estimation Methods
• Ilyas et al. (SIGMOD 2004)– Uses probabilistic model of data– Assumes relations of equal size and a
scoring function that sums scores– Limited applicability
• Li et al. (SIGMOD 2005)– Samples a subset of rows from each table– Independent samples give a poor model
of join results
![Page 6: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/6.jpg)
6
Our Solution: DEEP
• DEpth Estimation for Physical plans• Strengths of DEEP
– A principled methodology• Uses statistical model of data distribution• Formally computes depth over statistics
– Efficient estimation algorithms– Widely applicable
• Works with state-of-the-art physical plans• Realizable with common data synopses
![Page 7: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/7.jpg)
7
Outline•Preliminaries•DEEP Framework•Experimental
Results
![Page 8: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/8.jpg)
8
Outline•Preliminaries•DEEP Framework•Experimental
Results
![Page 9: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/9.jpg)
9
Monotonic Functions
x
f(x)
• A function f(x1,...,xn) is monotonic ifi(xi≤yi) f(x1,...,xn) ≤ f(y1,...,yn)
![Page 10: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/10.jpg)
10
Monotonic Functions
• A function f(x1,...,xn) is monotonic ifi(xi≤yi) f(x1,...,xn) ≤ f(y1,...,yn)
• Most scoring functions are monotonic– E.g. sum, product, avg, max, min
• Monotonicity enables bound on score– In example query, score was
0.3/h.price + 0.5*r.rating + 0.2*isMusic(e)
– Given a restaurant r, upper bound is0.3*1 + 0.5*r.rating + 0.2*1
![Page 11: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/11.jpg)
11
Hash Rank Join [IAE04]
• The Hash Rank Join algorithm– Joins inputs sorted by score– Returns results with highest score
• Main ideas– Alternate between inputs based on pull strategy– Score bounds allow early termination
L a bL
x 1.0
y 0.8
· · ·
R a bR
y 1.0
z 0.9
w 0.7
· · ·
Query: Top result from L Rwith scoring functionS(bL, bR) = bL + bR
Result: yScore: 1.8
Bound: 1.8 Bound: 1.7
![Page 12: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/12.jpg)
12
HRJN* [IAE04]
• The HRJN* pull strategy:a) Pull from the input with highest boundb) If (a) is a tie, pull from input with the smaller
number of pulls so farc) If (b) is a tie, pull from the left
L a bL R a bR
Bound: 2.0 Bound: 2.01.8 1.9 1.7
x 1.0
y 0.8
y 1.0
z 0.9
w 0.7Result: yScore: 1.8
Query: Top result from L Rwith scoring functionS(bL, bR) = bL + bR
?
![Page 13: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/13.jpg)
13
Outline•Preliminaries•DEEP Framework•Experimental
Results
![Page 14: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/14.jpg)
14
Supported Operators
Evidence in favor of HRJN*– Pull strategy has strong properties
• Within constant factor of optimal cost• Optimal for a significant class of inputs• More details in the paper
– Efficient in experiments [IAE04]
DEEP explicitly supports HRJN*– Easily extended to other join operators– Selection operators too
![Page 15: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/15.jpg)
15
DEEP: Conceptual View
DepthComputation
StatisticalData Model
defined interms of
Formalization
EstimationAlgorithms
StatisticsInterface
defined interms of
Implementation
Data Synopsis
![Page 16: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/16.jpg)
16
Statistics Model
• Statistics yield the distribution of scores for base tables and joins
bL bR FL R(bL,bR)
1.01.01.00.90.6
1.00.50.70.70.7
64322
bL FL(bL)
1.00.90.80.60.4
523128
bR FR(bR)
1.00.70.5
312
FR
FL
FL R
![Page 17: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/17.jpg)
17
Statistics Interface
• DEEP accesses statistics with two methods– getFreq(b): Return frequency of b– nextScore(b,i): Return next lowest score on dimension i
getFreq(b) = 3nextScore(b,1)=0.9nextScore(b,2)=0.5
bL bR FL R(bL,bR)
1.01.01.00.90.6
1.00.50.70.70.7
64322
b
• The interface allows for efficient algorithms– Abstracts the physical statistics format– Allows statistics to be generated on-the-fly
![Page 18: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/18.jpg)
18
Statistics Implementation
• Interface can be implemented over common types of data synopses
• Can use a histogram ifa) Base score function is invertible, or
b) Base score measures distance
• Assume uniformity & independence ifa) Base score function is too complex, or
b) Sufficient statistics are not available
![Page 19: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/19.jpg)
19
Depth Estimation Overview
Estimates made Value
2
2
1
1
1
2
Top-k query plan
A B
C
l2 r2
r1l1
s2
s11. Score of the kth best
tuple out ofs1
2. Depths of needed to output score of s1
l1 and r1
3. Score of the l1th best tuple out of
s2
4. Depths of needed to output score of s2
l2 and r2
![Page 20: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/20.jpg)
20
• Idea– Sort by total score– Sum frequencies
Estimating Terminal Score
• Suppose we want the 10th best score
bL + bR FL R(bL,bR) sum
6911
bL bR FL R(bL,bR)
1.01.01.00.90.6
1.00.50.70.70.7
64322
63242
2.01.71.61.51.3
Sterm = 1.6
![Page 21: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/21.jpg)
21
Estimation Algorithm
1 0.7 0.5
6 41
0.9
0.8
0.6
3
2
2
• Idea: Only process necessary statistics
bL bR FL R(bL,bR)
1.01.01.00.90.6
1.00.50.70.70.7
64322
• Algorithm relies solely on getFreq and nextScore– Avoids materializing complete table
• Worst-case complexity equivalent to sorting table– More efficient in practice
Sterm = 1.6
![Page 22: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/22.jpg)
22
Depth Estimation Overview
Estimates made Value
2
2
1
1
1
2
Top-k query plan
A B
C
l2 r2
r1l1
s2
s11. Score of the kth best
tuple out ofs1
2. Depths of needed to output score of s1
l1 and r1
3. Score of the l1th best tuple out of
s2
4. Depths of needed to output score of s2
l2 and r2
![Page 23: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/23.jpg)
23
Estimating Depth for HRJN*
Example: Sterm = 1.6
11 ≤ depth ≤ 15
bL FL(bL)
1.0
0.9
0.8
0.6
0.4
5
2
3
12
8
> Sterm
> Sterm
> Sterm
> Sterm
= Sterm
= Sterm
= Sterm
< Sterm
< Sterm
< Sterm
< Sterm
< Sterm
Input Scores
Theorem: i depth of HRJN* j
i
j
• Estimation algorithm– Access via getFreq and nextScore
– Similar to estimation of Sterm
bL + 1 FL(bL)
2.01.91.8
1.6
1.4
523
4
8
![Page 24: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/24.jpg)
24
Outline•Preliminaries•DEEP Framework•Experimental
Results
![Page 25: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/25.jpg)
25
Experimental Setting
• TPC-H data set– Total size of 1 GB– Varying amount of skew
• Workloads of 250 queries– Top-10, top-100, top-1000 queries– One or two joins per query
• Error metric: absolute relative error
![Page 26: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/26.jpg)
26
Depth Estimation Techniques• DEEP
– Uses 150 KB TuG synopsis [SP06]
• Probabilistic [IAE04]– Uses same TuG synopsis– Modified to handle single-join queries
with varying table sizes
• Sampling [LCIS05]– 5% sample = 4.6 MB
![Page 27: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/27.jpg)
27
Error for Varying Skew
342%
3%
12%
2%
18% 16%
44%39%
489% 501%
297%
1
10
100
1000
0 0.5 1 1.5z
ARE (%)
DEEP Probabilistic Sampling
Zipfian Skew Parameter
Per
cen
tag
e E
rro
r
![Page 28: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/28.jpg)
28
Error at Each Input
697% 683% 696% 679%
284%
695% 644%
28% 28% 28% 28%28%25%35%
1
10
100
1000
10000
1 2 3 4 5 6 7Input
ARE (%)
DEEP Sampling
Input of Two-Join Query
Per
cen
tag
e E
rro
r
![Page 29: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/29.jpg)
29
Conclusions
• Depth estimation is necessary to optimize relational ranking queries
• DEEP is a principled and practical solution– Takes data distribution into account– Applies to many common scenarios– Integrates with data summarization techniques
• New theoretical results for HRJN*• Next steps
– Accuracy guarantees– Data synopses for complex base scores
(especially text predicates)
![Page 30: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/30.jpg)
30
Thank You
![Page 31: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/31.jpg)
31
Related Work
• Selectivity estimation is a similar idea• It is the inverse problem
L R L R
depth? depth?
selectivity = k
depth = |L|
selectivity?
depth = |R|
Selectivity Estimation Depth Estimation
![Page 32: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/32.jpg)
32
Other Features
• DEEP can be extended to NLRJ and selection operators
• DEEP can be extended to other pulling strategies– Block-based HRJN*– Block-based alternation
![Page 33: Depth Estimation for Ranking Query Optimization](https://reader036.fdocuments.net/reader036/viewer/2022062322/56814865550346895db57438/html5/thumbnails/33.jpg)
33
Analysis of HRJN*
• Within the class of all HRJN variants:– HRJN* is optimal for many cases
• With no ties of score bound between inputs
• With no ties of score bound within one input
– HRJN* is instance optimal