Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order...

Click here to load reader

download Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.

of 26

Transcript of Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order...

  • Slide 1

Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1 Slide 2 Outline Introduction Preliminaries Skyline Processing in Z-Order ZSearch for Skyline Query ZUpdate for Skyline Updates Experiment Conclusion 2 Slide 3 Introduction Finding skyline points from very large in high dimensional space is expensive operation. Most of the work in the literature targets is improving performance of skyline query in high dimensional space. 3 Slide 4 Preliminaries Skyline Problems and Properties 4 Slide 5 (Cont.) Skyline Query Processing: 1.Divide and Conquer Algorithm. 2.Sorting-based Algorithm. 3.Hybrid Algorithm. 5 Slide 6 (Cont.) Divide and Conquer Algorithm: D&C divides a dataset into several small partitions and computes every partial skyline. The complete skyline is obtained by merging all partial skylines and removing dominated data points. Sorting-based Algorithm: SFS is devised based on an observation that by getting a dataset presorted according to a certain monotone scoring function such as sum of attributes. SFS sequentially scans the sorted dataset and keeps a set of skyline candidates. Dominance tests in SFS are based on an exhaustive scan on existing skyline candidates. 6 Slide 7 (Cont.) Hybrid Algorithm: Including Index, NN, and BBS. BBS is based on NN search, and adopts R-tree as its underlying index. 7 Slide 8 (Cont.) The nearest neighbor(NN): dominates is a skyline point. 8 Slide 9 (Cont.) The second NN: dominates not dominated by is another skyline. 9 Slide 10 (Cont.) are the skylines 10 Slide 11 (Cont.) BBS deletion an Insertion: Deletion: Need to found EDR(Exclusive Dominance Region). Insertion: Need compared with other skyline points. 11 Slide 12 Skyline Processing in Z-Order Skyline and Z-Order Curve For a d-dimensional space with as the coordinate value domain ranges, the Z-address of a data point contains dv bits, which can be considered as v d-bit groups. The i- th bit of a Z-address is contributed by the (i/d)-th bit of the (i%d)- th coordinate. 12 Slide 13 (Cont.) Instance : Z-address : (011111) 13 Slide 14 (Cont.) P1(00 11 11), P2(01 01 10), P3(01 10 00), P4(01 11 11) P5(10 01 10), P6(10 10 11), P7(10 11 00), P8(11 00 01) P9(11 11 00) 14 Slide 15 (Cont.) 15 Slide 16 (Cont.) Zbtree Index Structure 16 Slide 17 (Cont.) To facilitate data processing along a Z-order address sequence. To preserve data points in regions to enable efficient search space pruning. Assign a Z-address to all points, store them in a B + tree 17 Slide 18 ZSearch for Skyline Query RZ-Region Based Dominance Test 18 Slide 19 (Cont.) ZSearch Algorithm Use RZ-Region Based Dominance Test Input: ZBtree for source data set Local: a stack s Output: Skyline points 19 Slide 20 (Cont.) 20 Slide 21 ZUpdate for Skyline Updates Insertion: Insert Pinsert compared with skyline that Z-adress smaller than Pinsert. If Pinsert is be dominated, the skyline is the same. Else compared with skyline that Z-adress larger than Pinserts. 21 Slide 22 (Cont.) Deletion Find the points are only dominated by Pdel, and Pdel add to skyline set. Just comparing with the points Z-address larger than Pdels. 22 Slide 23 Experiment 23 Slide 24 (Cont.) 24 Slide 25 (Cont.) 25 Slide 26 Conclusion In this paper, we analyze the skyline problems and exploit the orderingand clustering properties of the Z-order curve which match perfectly well with the skyline processing strategies. The ZSearch algorithm scales very well in both dimensionality and cardinality. 26