R ++ -tree : an efficient spatial access method for highly redundant point data
description
Transcript of R ++ -tree : an efficient spatial access method for highly redundant point data
R++-tree: an efficient spatial access method for
highly redundant point data Martin Šumák, Peter Gurský
University of P. J. Šafárik in Košice
Research motivation Besides kNN and range queries, R-tree-like
index is usable for computation of Top-k query (find best k objects according to user preferences)
h(x1, x2) = f1(x1) + f2(x2)
Martin Šumák, Peter Gurský at ADBIS 2013
Martin Šumák, Peter Gurský at ADBIS 2013
Why highly redundant point data Our data consists of flats with the following
attributes: price area floor max floor of building year of approbation number of rooms
Each flat is represented by a point in 6-dimensional space
Martin Šumák, Peter Gurský at ADBIS 2013
R+-tree fundamentals R+-tree is R-tree-like index with the following
specialities: zero overlaps between
nodes at the same level rectangles of nodes
cover all the parent’srectangle
suitable for point dataand point queries
Martin Šumák, Peter Gurský at ADBIS 2013
R+-tree fundamentals desired state
zero overlaps minimum bounding
rect.
R+-tree avoids overlaps at the
cost of rectangles size
Martin Šumák, Peter Gurský at ADBIS 2013
desired state zero overlaps minimum bounding
rect.
R++-tree inner nodes keep two
rectangles for each child node – the minimum and the parent covering one
The R++-tree idea
Martin Šumák, Peter Gurský at ADBIS 2013
desired state zero overlaps minimum bounding
rect.
R++-tree inner nodes keep two
rectangles for each child node – the minimum and the parent covering one
Leaf nodes left
unchanged
The R++-tree idea
Martin Šumák, Peter Gurský at ADBIS 2013
Nodes of R++-tree Leaf nodes
Exactly same as leaf nodes of R+-tree Contain Id and coordinates for each object Take one disk page each
Inner nodes Contain pointer and two rectangles for each
child node Take two disk pages each
Martin Šumák, Peter Gurský at ADBIS 2013
Using of two rectangles in inner nodes Searching
Only the minimum bounding rectangles are necessary
Inserting new objects Both minimum bounding and parent covering
rectangles need to be used (read/updated)
Martin Šumák, Peter Gurský at ADBIS 2013
Implementation of inner nodes First page contains minimum bounding
rectangles Second page contains parent covering
rectangles
Martin Šumák, Peter Gurský at ADBIS 2013
Advantages and drawbacks of two pages idea Advantages
searching requires reading of one page per each node involved
rate between page size and node capacity is the same as in R+-tree
Drawbacks When updating, two pages per inner node need to
be processed The real impact on whole index size is relatively low
Martin Šumák, Peter Gurský at ADBIS 2013
Experiments - data Artificial data (range, kNN and top-k query)
100 000 random points of 2–10-dimensional space decimal values within [0; 1] Integer values from 1 to 100 Integer values from 1 to 10
Pseudo-real data (top-k query) 6 dimensional points – data of flats for sale
550 000 flats (20-multiple set) 2 700 000 flats (100-multiple set)
Martin Šumák, Peter Gurský at ADBIS 2013
Experiments - measures 300 random queries per each data set and
query type
Average time per query
Average number of I/Os per query One I/O corresponds to reading of one page i.e.
processing one node
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1]
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1]
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with decimal values within [0; 1]
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 100
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10
Martin Šumák, Peter Gurský at ADBIS 2013
Artificial data100 000 random points with integer values from 1 to 10
Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data550 000 flats (i.e. 6-dimensional points)
Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data550 000 flats (i.e. 6-dimensional points)
Martin Šumák, Peter Gurský at ADBIS 2013
Pseudo-real data2 700 000 flats (i.e. 6-dimensional points)
Martin Šumák, Peter Gurský at ADBIS 2013
Thank you for your attention