DP/GP15N - DP/GP18N - DP/GP20CN DP/GP20N - DP/GP25N - DP ...
Dp idp exploredb
-
Upload
george-valkanas -
Category
Data & Analytics
-
view
149 -
download
3
description
Transcript of Dp idp exploredb
![Page 1: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/1.jpg)
George Valkanas1, Apostolos N. Papadopoulos2, Dimitrios Gunopulos1
Skyline Ranking à la IR
1University of Athens, Greece2Aristotle University of Thessaloniki, Greece
1st ExploreDB WorkshopAthens, Greece28th March, 2014
![Page 2: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/2.jpg)
Skyline Problem Introduction
• Dataset D = (p1, p2, …, pn) in d-dimensional space• Preferences for each dimension: min, max• p dominates q iff pi ≤ qi i = 1,..,d && j: pj < qj
![Page 3: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/3.jpg)
Usefulness of Skyline• Multi-Objective optimization
• Exploratory Search
• Improve Recommendations
• Data summarization technique
• Building block for defining competitiveness
![Page 4: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/4.jpg)
Skyline Cardinality Explosion
O( (ln n)d-1)
• Skyline becomes too large to inspect manually
![Page 5: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/5.jpg)
Solving the Cardinality Problem
• Select subset of size k– Coverage-based– Contour representation– Diversification
• Ranking– Top-k Dominating– Subspace dominance
![Page 6: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/6.jpg)
Skyline + IR: Intuition
• Dominated points are not equally important• Scheme similar to TF-IDF
![Page 7: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/7.jpg)
Skyline + IR: How ?
• 2 Factors– DP (~ tf)
– IDP (~ idf)
• DP-IDP
![Page 8: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/8.jpg)
Ranking the Skyline• Baseline:
– sp• Iterate over its dominated points, and SUM
SlowUnnecessary computations
• Alternative?Bound the score
• Lower• Upper
Prune skyline points
![Page 9: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/9.jpg)
A Simpler Representation
• More comprehensive for bounds
![Page 10: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/10.jpg)
Bounding the Score• Q1: What is the score for B ?
![Page 11: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/11.jpg)
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
![Page 12: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/12.jpg)
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
• Q2: What is the maximum score for B ?
![Page 13: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/13.jpg)
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining
edges
![Page 14: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/14.jpg)
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining
edges
• Q3: What is the appropriate way?
![Page 15: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/15.jpg)
Bounding the Score• Q1: What is the score for B ?• A1: Depends on the assignment of the
remaining edges
• Q2: What is the maximum score for B ?• A2: Assign appropriately the remaining
edges
• Q3: What is the appropriate way?• A3:
– Same layer → Higher score (dp)– Minimum overlap → Higher score (idp)
• No overlap → Loose bounds
![Page 16: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/16.jpg)
The SkyIR Algorithm
![Page 17: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/17.jpg)
The SkyIR Algorithm
![Page 18: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/18.jpg)
The SkyIR Algorithm
![Page 19: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/19.jpg)
The SkyIR Algorithm
![Page 20: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/20.jpg)
The SkyIR Algorithm
![Page 21: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/21.jpg)
The SkyIR Algorithm
![Page 22: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/22.jpg)
The SkyIR Algorithm
• Priority can be:– Round Robin (RRB)– Pending points (PND)– Upper Bound (UBS)
![Page 23: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/23.jpg)
Experimental Setup
• Datasets
• Algorithms– Baseline– SkyIR
• Bounds: Loose (LS), Collaborative (CB)• 3 Priority schemes: RRB, PND, UBS
![Page 24: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/24.jpg)
Total Runtime – IND distr
k=5, d=3
CB-UBS is 4x faster than the Baseline
![Page 25: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/25.jpg)
Total Runtime – ANT distr
• Interesting fact: ANT is easier than IND (fewer layers to extract)
![Page 26: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/26.jpg)
Total Runtime – Forest Cover
![Page 27: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/27.jpg)
Memory Consumption
CB, k=5
PND is the best memory-wise
![Page 28: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/28.jpg)
Conclusions
• IR-style ranking for skyline– Formal framework– Bounds for efficient computation
• SkyIR algorithm– Experimental evaluation
• Future Work– Speed up / Scale up– Improve bounds (lower, upper)– Approximation technique(s)
![Page 29: Dp idp exploredb](https://reader033.fdocuments.net/reader033/viewer/2022061102/53edf5798d7f7289708b5fc0/html5/thumbnails/29.jpg)
Thank you!
Questions?
Acknowledgements: Heraclitus II fellowship, THALIS – GeomComp, THALIS – DISFER, ARISTEIA – MMD, FP7 INSIGHT