R ++ -tree : an efficient spatial access method for highly redundant point data

R++-tree: an efficient spatial access method for

highly redundant point data Martin Šumák, Peter Gurský

University of P. J. Šafárik in Košice

Research motivation Besides kNN and range queries, R-tree-like

index is usable for computation of Top-k query (find best k objects according to user preferences)

h(x1, x2) = f1(x1) + f2(x2)

Martin Šumák, Peter Gurský at ADBIS 2013


Why highly redundant point data Our data consists of flats with the following

attributes: price area floor max floor of building year of approbation number of rooms

Each flat is represented by a point in 6-dimensional space


R+-tree fundamentals R+-tree is R-tree-like index with the following

specialities: zero overlaps between

nodes at the same level rectangles of nodes

cover all the parent’srectangle

suitable for point dataand point queries


R+-tree fundamentals desired state

zero overlaps minimum bounding

rect.

R+-tree avoids overlaps at the

cost of rectangles size


desired state zero overlaps minimum bounding

rect.

R++-tree inner nodes keep two

rectangles for each child node – the minimum and the parent covering one

The R++-tree idea


desired state zero overlaps minimum bounding

rect.

R++-tree inner nodes keep two

rectangles for each child node – the minimum and the parent covering one

Leaf nodes left

unchanged

The R++-tree idea


Nodes of R++-tree Leaf nodes

Exactly same as leaf nodes of R+-tree Contain Id and coordinates for each object Take one disk page each

Inner nodes Contain pointer and two rectangles for each

child node Take two disk pages each


Using of two rectangles in inner nodes Searching

Only the minimum bounding rectangles are necessary

Inserting new objects Both minimum bounding and parent covering

rectangles need to be used (read/updated)


Implementation of inner nodes First page contains minimum bounding

rectangles Second page contains parent covering

rectangles


Advantages and drawbacks of two pages idea Advantages

searching requires reading of one page per each node involved

rate between page size and node capacity is the same as in R+-tree

Drawbacks When updating, two pages per inner node need to

be processed The real impact on whole index size is relatively low


Experiments - data Artificial data (range, kNN and top-k query)

100 000 random points of 2–10-dimensional space decimal values within [0; 1] Integer values from 1 to 100 Integer values from 1 to 10

Pseudo-real data (top-k query) 6 dimensional points – data of flats for sale

550 000 flats (20-multiple set) 2 700 000 flats (100-multiple set)


Experiments - measures 300 random queries per each data set and

query type

Average time per query

Average number of I/Os per query One I/O corresponds to reading of one page i.e.

processing one node


Artificial data100 000 random points with decimal values within [0; 1]


Artificial data100 000 random points with integer values from 1 to 100


Artificial data100 000 random points with integer values from 1 to 10


Pseudo-real data550 000 flats (i.e. 6-dimensional points)


Pseudo-real data2 700 000 flats (i.e. 6-dimensional points)


Thank you for your attention

R ++ -tree : an efficient spatial access method for highly redundant point data

Documents

Transcript of R ++ -tree : an efficient spatial access method for highly redundant point data