1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
-
Upload
brent-hampton -
Category
Documents
-
view
220 -
download
0
Transcript of 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
![Page 1: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/1.jpg)
1
SD-Rtree: A Scalable Distributed Rtree
Witold Litwin &Cédric du Mouza & Philippe
Rigaux
![Page 2: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/2.jpg)
2
Plan
Introduction SDDS R-tree
SD-Rtree Evolution Balancing
Spatial Rotations Overlapping
Redundant Coverage Queries Performance Conclusion
![Page 3: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/3.jpg)
3
SDDS Principles (1993)
Data are at server nodes Communicating through point-to-point
messaging ; Overloaded servers split over new
servers Queries go to client nodes use local
images of the SDDS No central addressing component A node can be client and server (peer)
![Page 4: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/4.jpg)
4
SDDS Principles (1993)
An outdated image may send a query an incorrect server
Servers forward such a query to the correct server
Image gets adjusted Image Adjustment Message (IAM) comes back
Client does not repeat the same error twice Data are basically in the RAM of the servers
![Page 5: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/5.jpg)
5
SD-Rtree : a Spatial SDDS
Distributed Spatial Data
![Page 6: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/6.jpg)
6
SD-Rtree : a Spatial SDDS
•Distributed Index • No central component
![Page 7: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/7.jpg)
7
SD-Rtree : a Spatial SDDS Point & Window Queries kNN queries (future)
![Page 8: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/8.jpg)
8
SD-Rtree : Generalizes R-tree
R-tree: Nodes are minimal
bounding boxes Leaf nodes point to
data Internal nodes
bound subtrees May overlap Split when overflow Generate balanced
m-ary tree
![Page 9: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/9.jpg)
9
SD-Rtree : Generalizes R-tree
R-tree: An insert may
go through multiple paths
Ends up in the smallest bounding box
If there is any One of the
boxes gets enlarged
Box may split
![Page 10: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/10.jpg)
10
SD-Rtree : Generalizes R-tree
R-tree: Search may go
through multiple paths
All paths may bring relevant objects
![Page 11: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/11.jpg)
13
SD-Rtree: a Balanced Binary Tree
The SD-Rtree is a balanced binary tree, distributed on a set of servers, such that: Each internal node (or routing node) has
exactly two sons Each leaf node stores a subset of the
indexed dataset At each node, the height of the subtrees
differ by at most one Each server stores one data node and one
routing node
![Page 12: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/12.jpg)
14
Sd-tree: Binary Tree Structure
di = data node (leaf) ri = routing node (internal node)
![Page 13: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/13.jpg)
15
Sd-tree: Tree Distribution
![Page 14: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/14.jpg)
17
SD-Rtree Balancing
The binary tree should be height-balanced The heights of the two subtrees
rooted at any node should not differ by more than 1 (cf. AVL trees)
The tree height is then logarithmic in the number of leaves
![Page 15: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/15.jpg)
18
SD-Rtree Balancing
SD-Rtree balancing occurs during splits Messages are sent bottom-up to adjust the
height of the ancestor nodes Rotation occurs if an ancestor is imbalanced SD-Rtree rotation are spatial
change rectangles of internal nodes Best rotation minimizes rectangle overlapping
Tie breaking minimizes the « dead space »
![Page 16: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/16.jpg)
20
Rotation Pattern
Properties The sons of a node are
not ordered => more freedom for
reorganizing the tree Any imbalanced node
matches a rotation pattern
A rotation pattern is a subtree a(b(e(f,g),d),c) such that:
h(c) = h(d) = h(f ) = n − 1 (n > 0)
h(g) = max(0, n − 2)
![Page 17: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/17.jpg)
21
SD-Rtree :Spatial Rotation
![Page 18: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/18.jpg)
22
Rotation Cost Constant number of messages (3 or 6,
depending on the choice) Few rotations in practice
In particular when the dataset is uniformly distributed
See our experiments
![Page 19: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/19.jpg)
23
SD-Rtree : Images
Each image defines the addressing structure Resides as cache on a client or on a peer Starts with the address of the contact
server IAMs make it a subtree
Splits make images outdated IAMs adjust it incrementally
![Page 20: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/20.jpg)
24
Image Adjustment Client contacts a server with a query Each incorrect server initiates a
traversal of the tree During the traversal, the description
of the nodes is collected The correct server sends the up-to-
date tree structure The client updates its image
![Page 21: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/21.jpg)
26
Out-of-range situation
![Page 22: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/22.jpg)
27
Insertion of objects
![Page 23: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/23.jpg)
28
Overlapping management The directory rectangles in an Rtree may
overlap Local subtree does not suffice for locating all
the nodes that contains the point (point query) or the window (window query) searched for.
SD-Rtree servers maintain data on node overlapping Redundant Coverage
It avoids to systematically access the root node.
![Page 24: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/24.jpg)
29
Redundant Coverage Example
The region common to A and B is stored on both nodes
If a point query sent to A falls in the region shared with B: A sends a point query message to B
For D: we must keep the intersection with C or B: here empty.
![Page 25: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/25.jpg)
30
Queries Point queries and window queries. The
technique is similar to the insertion algorithm: Search in the client image a server whose mbb
contains the point or intersects the window Send the query to this server If the server actually covers the point or the
window; it answers to the client; else it sends the query to its parent node
A server uses the overlapping information to transmit the query
![Page 26: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/26.jpg)
31
Experiments Synthetic data (points and rectangles)
generated with GSTD 50.000 to 500.000 objects 0 to 3.000 queries Server capacity: 3 000 objects
Comparison of three SD-Rtree variants: BASIC: no image; every query is processed
top-down from the root IMSERVER: no IAMs among the servers IMCLIENT: client images
![Page 27: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/27.jpg)
33
Per Insert Cost
![Page 28: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/28.jpg)
34
Cost of balancing
![Page 29: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/29.jpg)
35
Image convergence
![Page 30: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/30.jpg)
36
Distribution of messages
![Page 31: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/31.jpg)
37
Cost per Query
![Page 32: 1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e5c5503460f94b549f0/html5/thumbnails/32.jpg)
38
Conclusion SD-Rtree is an efficient scalable distributed
Rtree For very large spatial data collections Can be processed in distributed RAM
Access time much faster than to disk data Load balancing
Spatial rotations Overlapping management
Redundant coverage O(log n) worst insert cost Future work
kNN-queries Objects distribution balancing on servers