Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing...
-
date post
20-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing...
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes
Raymond Chi-Wing Wong1, Ada Wai-Chee Fu2, Jian Pei3,Yip Sing Ho2, Tai Wong2 and Yubao Liu4
The Hong Kong University of Science and Technology1
The Chinese University of Hong Kong2
Simon Fraser University3
Sun Yat-Sen University4
Prepared by Raymond Chi-Wing WongPresented by Raymond Chi-Wing Wong
Outline
1. Introductiona. Skylineb. Contributions
2. Problem Definition3. Adaptive SFS4. IPO-Tree5. Conclusion
1. Introduction
Package ID
Price Hotel-class
a 1600 4
b 2400 1
c 3000 5
3 packages
Suppose we want to look for a vacation package
Package a “dominates” package b
We want to have a cheaper package. We want to have a higher hotel-
class.
We know that 1. Package a has a cheaper price2. Package a has a higher hotel-class
We want to find a set of packages which are NOT dominatedby any other pacakges All of the “best”
possible choices.i.e., {a, c}
skyline
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
6 packages
Suppose we want to look for a vacation package
We want to have a cheaper package. We want to have a higher hotel-
class.
How about this one?
Different customers may have different preferences on Hotel-group.
Suppose a customer has the
following preferences. H < T < MThe skyline points are packages a and c.
Suppose another customerhas the following
preferences. H < M < TThe skyline points are packages a, c and e.
In other words, differentpreferences give differentskyline points.
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
6 packages
Suppose we want to look for a vacation package
Suppose a customer has the
following preferences. H < T < MThe skyline points are packages a and c.
Suppose another customerhas the following
preferences. H < M < TThe skyline points are packages a, c and e.
In other words, differentpreferences give differentskyline points.
Problem: Given a preference
on Hotel-group, we want tofind the skyline with
respect to this preference efficiently
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a preference
on Hotel-group, we want tofind the skyline with
respect to this preference efficiently
1. Introduction
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a preference
on Hotel-group, we want tofind the skyline with
respect to this preference efficiently
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a preference
on Hotel-group, we want tofind the skyline with
respect to this preference efficiently
Straightforward solution:
Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query
It works. However, this solution is not scalable and the results cannot be returned efficiently.
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a preference
on Hotel-group, we want tofind the skyline with
respect to this preference efficiently
Straightforward solution:
Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query
Full Materialization solution:
Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline query
It works when there are limited number of preferences.
However, this solution is not scalable when there are a lot of possible preferences.
e.g. three nominal attributes (like Hotel-Group)
each of which contains 40 possible values
there are 4.1 x 109 possible preferences (in our
problem setting).
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a preference
on Hotel-group, we want tofind the skyline with
respect to this preference efficiently
Straightforward solution:
Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query
Full Materialization solution:
Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline querySemi-Materialization solution:
Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly OR with simple operations for a skyline query
Good tradeoff between storage consumption and efficiency
1. Introduction
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given a preference
on Hotel-group, we want tofind the skyline with
respect to this preference efficiently
Straightforward solution:
Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query
Full Materialization solution:
Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline querySemi-Materialization solution:
Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly OR with simple operations for a skyline query
Adaptive SFS
IPO-Tree (Implicit Preference Order Tree)
Questions: 1. What preferences should be stored?2. With these preferences, how can we perform a skyline query
efficiently?
1. Contributions Most Existing Work
Assume that each attribute has a certain ordering (either totally ordered or partially ordered) on the attribute values
Our Work Different users can have different
preferences (i.e., the ordering on attribute values are different with different users)
Propose a semi-materialization method IPO-tree to answer the skyline query efficiently.
2. Problem Definition Usually, a user should NOT specify an
ordering on all possible values on attribute Hotel-Group
Only list a few of the most favorite choicese.g. M < H < *Implicit preference
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
2. Problem Definition Usually, a user should NOT specify an
ordering on all possible values on attribute Hotel-Group
Only list a few of the most favorite choicese.g. M < H < *Implicit preference
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
A user prefers M to H.
2. Problem Definition Usually, a user should NOT specify an
ordering on all possible values on attribute Hotel-Group
Only list a few of the most favorite choicese.g. M < H < *Implicit preference
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
A user prefers H to *.
All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)
This is the reason why we call an implicit preference.
Problem: Given an implicit preference
on Hotel-group, we want to find theskyline with respect to this
preferenceefficiently
2. Problem Definition Usually, a user should NOT specify an
ordering on all possible values on attribute Hotel-Group
Only list a few of the most favorite choicese.g. M < H < *Implicit preference
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given an implicit preference
on Hotel-group, we want to find theskyline with respect to this
preferenceefficiently
Binary orders ={ }
All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)
M<H
2. Problem Definition Usually, a user should NOT specify an
ordering on all possible values on attribute Hotel-Group
Only list a few of the most favorite choicese.g. M < H < *Implicit preference
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given an implicit preference
on Hotel-group, we want to find theskyline with respect to this
preferenceefficiently
Binary orders ={ }
All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)
M<H , M<T
2. Problem Definition Usually, a user should NOT specify an
ordering on all possible values on attribute Hotel-Group
Only list a few of the most favorite choicese.g. M < H < *Implicit preference
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given an implicit preference
on Hotel-group, we want to find theskyline with respect to this
preferenceefficiently
Binary orders ={ }
All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)
M<H , M<T , H<T
2. Problem Definition Usually, a user should NOT specify an
ordering on all possible values on attribute Hotel-Group
Only list a few of the most favorite choicese.g. M < H < *Implicit preference
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Problem: Given an implicit preference
on Hotel-group, we want to find theskyline with respect to this
preferenceefficiently
Since the user gives only TWO choices, we define the order ofhis preference to be TWO.
We also call this preferencethe second-order implicit preference.
All possible values in attribute Hotel-group other than “M” and “H” (in this case, “T”)
Idea of our proposed semi-materialization IPO-tree
1. Store the skyline wrt the first-order implicit preference ONLY
2. Find the skyline wrt the implicit preference of any ordering from the skyline wrt the first-order implicit preference
Questions: 1. What preferences should be stored?2. With these preferences, how can we perform a skyline query
efficiently?
3. Adaptive SFS
Straightforward solution:
Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query
Full Materialization solution:
Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline querySemi-Materialization solution:
Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly OR with simple operations for a skyline query
Adaptive SFS
IPO-Tree (Implicit Preference Order Tree)
3. Adaptive SFS Original SFS
Idea: Suppose we have a function f Each tuple is assigned with a score obtained by f Sort the tuples in ascending order of the scores Process the tuples with this ordering
Adaptive SFS Similar idea However, the original score function is based on
Numeric attributes NOT nominal attributes
What we change is the score function
Idea:1. Pre-Computation:
first pre-sort the tuples according to this new score function
2. Skyline Query:re-sort the tuples for a skyline query
4. IPO-Tree
Straightforward solution:
Adopt some existing skyline techniques such as SFS (Sort-First Skyline) to compute the skyline on-the-fly when we need to perform a skyline query
Full Materialization solution:
Pre-computation: For each possible preference, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly for a skyline querySemi-Materialization solution:
Pre-computation: For SOME possible preferences, (1) pre-compute the skyline and (2) store it in a storageSkyline Query: return the stored skyline directly OR with simple operations for a skyline query
Adaptive SFS
IPO-Tree (Implicit Preference Order Tree)
4. IPO-Tree
Idea of our proposed semi-materialization IPO-tree
1. Store the skyline with respect to the first-order implicit preference ONLY
2. Find the skyline with respect the implicit preference of any ordering from the skyline with respect to the first-order implicit preference
Questions: 1. What preferences should be stored?2. With these preferences, how can we perform a skyline query
efficiently?
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Binary Orders:{M < T, M <
H} Some values other than
“M” (i.e., “H” and “T”)
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
Binary Orders:{M < T, M <
H} Some values other than
“H” (i.e., “T” and “M”)
Binary Orders:{H < T, H <
M}
f is NOT a skyline point. Why?
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
Binary Orders:{H < T, H <
M}
f is NOT a skyline point. Why?
With the binary order H<M, c dominates f
We say that “H<M” disqualifiesf as a skyline point.
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
M < H < *
Binary Orders:{M < T, M <
H}
Binary Orders:{ }
Binary Orders:{H < T, H <
M}
M<H
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
M < H < *
Binary Orders:{M < T, M <
H}
Binary Orders:{ }
Some values other than
“M” and “H” (i.e., “T”)
Binary Orders:{H < T, H <
M}
M<H, M<T
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
M < H < *
Binary Orders:{M < T, M <
H}
Binary Orders:{ }
Some values other than
“M” and “H” (i.e., “T”)
Binary Orders:{H < T, H <
M}
M<H, M<T , H<T
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
M < H < *
Binary Orders:{M < T, M <
H}
Binary Orders:{ }
Binary Orders:{H < T, H <
M}
M<H, M<T , H<T
PSKY1 = a set of data points in SKY1 with value “M” = {e, f}
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
M < H < *
Binary Orders:{M < T, M <
H}
Binary Orders:{ }
Binary Orders:{H < T, H <
M}
M<H, M<T , H<T
PSKY1
= {e, f}
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
M < H < *
Binary Orders:{M < T, M <
H}
Binary Orders:{ }
Binary Orders:{H < T, H <
M}
M<H, M<T , H<T
PSKY1 = {e, f}
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
M < H < *
Binary Orders:{M < T, M <
H}
Binary Orders:{ }
Binary Orders:{H < T, H <
M}
M<H, M<T , H<T
PSKY1 = {e, f}
SKY3={ }
SKY3 = (SKY1 SKY2) U PSKY1
U
= {a, c, e} U {e, f}= {a, c, e, f}
a, c, e, f
Additional binary order!
This binary order may
disqualify some datapoints in SKY3 like “f”
Observation: These points must be in
PSKY1
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
H < *SKY2 = {a, c, e}
M < H < *
Binary Orders:{M < T, M <
H}
Binary Orders:{ }
Binary Orders:{H < T, H <
M}
M<H, M<T , H<T
PSKY1 = {e, f}
SKY3={ }
SKY3 = (SKY1 SKY2) U PSKY1
U
= {a, c, e} U {e, f}= {a, c, e, f}
a, c, e, f
Skyline wrt thefirst-order preference
Skyline wrt thesecond-order preference
Skyline wrt thefirst-order preference
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
H < *SKY2 = {a, c, e}
M < H < * SKY3={ }a, c, e, f
Skyline wrt thefirst-order preference
Skyline wrt thesecond-order preference
Skyline wrt thefirst-order preference
v1 < v2 <*
v1 < *
v2 < *Merging Property
4. IPO-Tree
Second-order Preference
Skyline wrt thefirst-order preference
Skyline wrt thesecond-order preference
Skyline wrt thefirst-order preference
Third-order Preference
Skyline wrt thefirst-order preference
Skyline wrt thethird-order preference
Skyline wrt thesecond-order preference
Fourth-order Preference
Skyline wrt thefirst-order preference
Skyline wrt thefourth-order preference
Skyline wrt thethird-order preference
v1 < v2 <*
v1 < *
v2 < *
v1 < v2 < v3 < *
v1 < v2 < *
v3 < *
v1 < v2 < v3 < v4 < *
v1 < v2 < v3 < *
v4 < *
5. Empirical Study Datasets
Synthetic Dataset Anti-correlated dataset
Real Dataset (from UCI) Nursery Dataset
Default Values (Synthetic) No. of tuples = 500K No. of numeric dimensions = 3 No. of nominal dimensions = 2 No. of values in a nominal dimension = 20 Order of implicit preference = 3
5. Empirical Study Variation
No. of data points No. of numeric dimensions No. of nominal dimensions Cardinality of nominal dimensions Order of implicit preference
Comparison SFS-D SFS-A IPO Tree IPO Tree-10
Original SFS
Adaptive SFS
IPO Tree which stores 10most frequent values for eachnominal attribute (for
comparison)
5. Empirical StudySynthetic Data Set
5. Empirical StudyReal Data Set
6. Conclusion
Different customers have different preferences different skylines
Skyline Query on Nominal Attributes
Adaptive SFS algorithm IPO-Tree algorithm Experiments
Q&A
3. Adaptive SFS
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Using some existing algorithms, we can first removesome data points which must not be in skylinewith respect to any implicit preference
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Package ID
Score
a
c
e
f
Step 1 (Pre-computation): pre-sort the tuples according to the new score function
Each value in attribute Hotel-Group
is assigned with a SPECIAL value
This special value is set tothe total number of possible
valuesin Hotel-Group (i.e., 3)
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Package ID
Score
a
c
e
f
Step 1 (Pre-computation): pre-sort the tuples according to the new score function
Score of point a is1600 + 1 + 3
Each value in attribute Hotel-Group
is assigned with a SPECIAL value
This special value is set tothe total number of possible
valuesin Hotel-Group (i.e., 3)
= 1604
1604
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Package ID
Score
a
c
e
f
Step 1 (Pre-computation): pre-sort the tuples according to the new score function
Score of point c is3000 + 0 + 3
Each value in attribute Hotel-Group
is assigned with a SPECIAL value
This special value is set tothe total number of possible
valuesin Hotel-Group (i.e., 3)
= 3003
1604
3003
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Package ID
Score
a
c
e
f
Step 1 (Pre-computation): pre-sort the tuples according to the new score function
Each value in attribute Hotel-Group
is assigned with a SPECIAL value
This special value is set tothe total number of possible
valuesin Hotel-Group (i.e., 3)
1604
3003
2406
3005
Package ID
Score
a 1604
e 2406
c 3003
f 3005
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Package ID
Score
a 1604
e 2406
c 3003
f 3005
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*)
Package ID
Score
a 1604
e 2406
c 3003
f 3005
Value “H” is assigned with value 1.
Value “T” is assigned with value 2.
All values other than “H” and “T”
(i.e.,“M”) are still equal to value 3.
Pre-computation:
Package ID
Score
a
e
c
f
Skyline Query:
Score of point a is1600 + 1 + 2 = 1603
1603
2406
3005
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*)
Package ID
Score
a 1604
e 2406
c 3003
f 3005
Value “H” is assigned with value 1.
Value “T” is assigned with value 2.
All values other than “H” and “T”
(i.e.,“M”) are still equal to value 3.
Pre-computation:
Package ID
Score
a
e
c
f
Skyline Query:
Score of point c is3000 + 0 + 1 =3001
1603
3001
2406
3005
Since the score of a and c areupdated, we need to re-sorta and c.
Note that the ordering of all OTHER
points not containing “H” nor “T”
remains unchanged.
3. Adaptive SFS
Package ID
Price ReverseHotel-class
Hotel-group
a 1600 1 T (Tulips)
b 2400 4 T (Tulips)
c 3000 0 H (Horizon)
d 3600 1 H (Horizon)
e 2400 3 M (Mozilla)
f 3000 2 M (Mozilla)
Step 2 (Skyline Query): re-sort the tuples for a skyline query (e.g., H<T<*)
Package ID
Score
a 1604
e 2406
c 3003
f 3005
Pre-computation:
Package ID
Score
a
e
c
f
Skyline Query:
1603
3001
2406
3005
We just use the original SFS.
With this sorted list, we find the skyline = {a, c}
4. IPO-Tree
Idea Pre-computation
Store the skyline wrt the first-order preference
Skyline Query Find the skyline wrt the preference of
any order according to the stored skylines wrt the first-order preference
e.g.1 Hotel-Group: M<* Airline : G<*e.g.2 Hotel-Group: M<* Airline : e.g.3 Hotel-Group: Airline : G<*
How can we do it efficiently?
We propose an indexing structure called IPO-tree
4. IPO-Tree
Package ID
Price ReverseHotel-class
Hotel-group
Airline
a 1600 1 T (Tulips) G (Gonna)
b 2400 4 T (Tulips) G (Gonna)
c 3000 0 H (Horizon) G (Gonna)
d 3600 1 H (Horizon) R (Redish)
e 2400 3 M (Mozilla) R (Redish)
f 3000 2 M (Mozilla) W (Wings)root
T<* H<* M<*
G<* R<* W<* G<* R<* W<* G<* R<* W<* G<* R<* W<*
Hotel-group: T<*Airline : G<*
Hotel-group: T<*Airline :
Hotel-group: Airline : G<*
Hotel-Group
Airline
e.g. three nominal attributes (like Hotel-Group) each of which contains 40 possible values
Full Materialization
there are 4.1 x 109 possible preferences (in our problem setting).
Semi-Materialization IPO-tree
there are 70,644 nodes (which is significantly smaller than4.1 x 109).
4. IPO-Tree
One nominal attribute Merging Property
Multiple nominal attributes Consider ONE nominal attribute at a time
with Merging Property Fix the ordering of OTHER nominal
attributes Then, consider each of other nominal
attributes with Merging Property
4. IPO-Tree
Package ID
Price Hotel-class
Hotel-group
a 1600 4 T (Tulips)
b 2400 1 T (Tulips)
c 3000 5 H (Horizon)
d 3600 4 H (Horizon)
e 2400 2 M (Mozilla)
f 3000 3 M (Mozilla)
4. IPO-Tree
Package ID
Price Hotel-class
Hotel-group
Airline
a 1600 4 T (Tulips) G (Gonna)
b 2400 1 T (Tulips) G (Gonna)
c 3000 5 H (Horizon) G (Gonna)
d 3600 4 H (Horizon) R (Redish)
e 2400 2 M (Mozilla) R (Redish)
f 3000 3 M (Mozilla) W (Wings)
Hotel-Group: M<H<*Airline : G<R<*
Hotel-Group: M<*Airline : G<R<*
Hotel-Group: H<*Airline : G<R<*
Hotel-Group: M<*Airline : G<*
Hotel-Group: H<*Airline : R<*
Hotel-Group: H<*Airline : G<*
Hotel-Group: H<*Airline : R<*
4. IPO-Tree
M < *SKY1 = {a, c, e, f}
H < *SKY2 = {a, c, e}
M < H < *
PSKY1 = {e, f}
SKY3={ }
SKY3 = (SKY1 SKY2) U PSKY1
U
= {a, c, e} U {e, f}= {a, c, e, f}
a, c, e, f
4. IPO-Tree
Theorem: Given a user query with x-th order implicit preference on m’’ nominal attributes, the number of set operations required for an x-th order implicit preference is O(xm’’).
m’’ = 2x = 2
No. of set operations = O(22)
Hotel-Group: M<H<*Airline : G<R<*
e.g.