1
CSIS 7101:CSIS 7101:Spatial Data (Part 2)
Efficient Processing of Spatial Joins Using R-trees
Rollo ChanChu Chung Man
Mak Wai YipVivian Lee
Eric LoSindy ShouHugh Wang
2
Efficient Processing of Spatial Join Using R-trees
What is Spatial Data? Consists of points, lines, rectangles,
polygons, surfaces…
Two types of queries in DBS Single scan and Multiple scan queries
How to retrieve spatial objects in GIS efficiently? Spatial Access Method (SAM) – eg. R*-tree
3
Designed to support single scan query eg. Window query “Find all objects which intersect a given
window”
Attempts to store objects which are close together in the data space on a common page Reduces number of disk accesses
What is Spatial Access Method?
4
How is window query processed by SAM?
1) Filter step Find all objects whose minimum bounding
rectangles intersects the query rectangle
2) Refinement step Check whether the objects fulfill the query
condition
5
To combine two sets of spatial objects according to some spatial properties
It is an important type of query for multiple scanning in spatial DBS
What is Spatial Join?
6
Example of Spatial Join Two relations: forests, cities
(Assume an attributes in each relation represents the borders of forests and cities)
Example query would be: “Find all forests which are in a city”
7
Problems when performing Spatial Join
It is too expensive in terms of CPU time and I/O time
Traditional index structure is not efficient for spatial join
How to make it more efficient? R*-tree
8
Why using R*-tree for Spatial Join ?
To optimize CPU-time and I/O time
Less comparison than a simple nested loop
Other algorithms cannot be efficiently applied to spatial join
9
R*-tree Approach for Spatial Join
Suppose there are two R*-trees R, S
Idea:
To use the property that directory rectangles
form the minimum bounding box of data
rectangles in the corresponding subtrees.
If the rectangles of two directory entries ER
and ES have common intersection then there
is a pair (rectR, rectS)
10
Minimum Bounding Box
11
Is there anyway to be more efficient? There are two areas we need to take
into account in order to be more efficient
CPU – Time Tuning
I/O – Time Tuning
12
CPU – Time Tuning Two ways to improve CPU – time
Restricting the search space
Spatial sorting and plane sweep
13
Restricting the search space Idea:
Scan through each of two nodes marks all
entries which are required for performing
the join, (i.e. which intersect the intersecting
rectangles of two nodes. )
Then, each marked entry of one node is
tested against all marked entries of the
other node.
14
Restricting the search space (cont’d)
1
4
3
2
5
6
7
1
2
3
46
5
7
Original: 7 of R * 7 of S
1
21
2
3 Now: 3 of R * 2 of S
= 49 joins
Plus Scanning: 7 of R + 7 of S
=6 joins
= 14 times
15
Spatial sorting and plane sweep Idea:
Sort the entries in a node of the R*-tree
according to the spatial location of the
corresponding rectangles.
Then move the Sweep-Line perpendicular to
one of the axes from left to right to compute
the intersections.
16
Example of Sorted Intersection Test
t = r1 : r1 <--> s1
t = s1 : s1 <--> r2
t = r2 : r2 <--> s2, r2 <--> s3
t = s2 : - t = r3: r3 <--> s3
Sweep-Line
r1.xu
s1.xl
s1.xl < r1.xu
17
I/O Time Tuning To achieve good I/O-performance with a buffer
size as small as possible R*-tree might occupy only small portion of LRU-
buffer
Compute a read schedule of the pages to minimize the number of disk accesses Local optimization policy based on spatial locality
Idea of Read Schedule: If a frequently used page always resides in the buffer, the number of disk access can be improved by a lot
18
Three such techniques Local plane sweep
Local plane sweep with pinning
Local z-order
19
Local Plane-Sweep Order Idea:
Based on spatial ordering, the plane-sweep
algorithm creates a sequence of pairs of
intersecting rectangles.
This sequence can be used to determine the
read schedule of the spatial join.
20
Local Plane-Sweep Order (cont’d)
Read schedule:
s1
r1
r2
s2
r3
r4
1 2
3
4
5
6
<
s1
s2
r2
r1
r4
r3
>, , , , ,
21
Local Plane-Sweep Order w/ Pinning
Idea:1. Determine a pair of (Er,Es) of entries wrt local
plane sweep order. Compute the degree of the rectangles of both entries Deg(E.rect) = # of intersections between E.rect
and the rectangles which belong to entries of the other tree that are not yet processed
2. Pin the page in the buffer whose corresponding rectangle has maximal degree
3. Perform spatial join on the pinned page with all other pages
22
Local Plane-Sweep Order w/ Pinning (cont’d)
s1
r1
r2
s2
r3
r4
Er
EsEr.rect = r1Es.rect = s2
Deg(r1) =Deg(s2) =
02
1
2
23
Local Z-Order Idea:
1. Compute the intersections between each rectangle of the one node and all rectangles of the other node
2. Sort the rectangles according to the spatial location of their centers
3. Decompose the underlying space into cells of equal size and provide an ordering on this set of cells
24
Local Z-Order (cont’d)
s1
r1
r2
s2
r3
r4
IV II
I
III
IV
Read schedule:<s1,r2,r1,s2,r4,r3>
II
I
III
25
Number of Disk Access
0
1000
2000
3000
4000
5000
6000
7000
LPS order LPS order w/Pinning
Z-order
0KByte8KByte32KByte128KByte512KByte
5384 5290
2373 2392
Size ofLRU Buffer
>
<
26
Number of Disk Access (cont’d)
0
1000
2000
3000
4000
5000
6000
Original LPS order w/ Pinning
0KByte
8KByte
32KByte
128KByte
512KByte
Size ofLRU Buffer
27
Q & A
That’s it for the PresentationAny Questions?
28
Reference1. Brinkhoff T., Kriegel H.P., Seeger B. (1993).
Institute of Computer Science, University of Munich. Efficient Processing of Spatial Joins Using R-trees. Washington, DC, USA: ACM-SIGMOD.
Top Related