R ++ -tree : an efficient spatial access method for highly redundant point data

26
R ++ -tree: an efficient spatial access method for highly redundant point data Martin Šumák, Peter Gurský University of P. J. Šafárik in Košice

description

R ++ -tree : an efficient spatial access method for highly redundant point data . Martin Šumák , Peter Gurský University of P. J. Šafárik in Košice. Research motivation. - PowerPoint PPT Presentation

Transcript of R ++ -tree : an efficient spatial access method for highly redundant point data

Page 1: R ++ -tree :  an  efficient spatial access method for highly redundant point data

R++-tree: an efficient spatial access method for

highly redundant point data Martin Šumák, Peter Gurský

University of P. J. Šafárik in Košice

Page 2: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Research motivation Besides kNN and range queries, R-tree-like

index is usable for computation of Top-k query (find best k objects according to user preferences)

h(x1, x2) = f1(x1) + f2(x2)

Martin Šumák, Peter Gurský at ADBIS 2013

Page 3: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Why highly redundant point data Our data consists of flats with the following

attributes: price area floor max floor of building year of approbation number of rooms

Each flat is represented by a point in 6-dimensional space

Page 4: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

R+-tree fundamentals R+-tree is R-tree-like index with the following

specialities: zero overlaps between

nodes at the same level rectangles of nodes

cover all the parent’srectangle

suitable for point dataand point queries

Page 5: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

R+-tree fundamentals desired state

zero overlaps minimum bounding

rect.

R+-tree avoids overlaps at the

cost of rectangles size

Page 6: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

desired state zero overlaps minimum bounding

rect.

R++-tree inner nodes keep two

rectangles for each child node – the minimum and the parent covering one

The R++-tree idea

Page 7: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

desired state zero overlaps minimum bounding

rect.

R++-tree inner nodes keep two

rectangles for each child node – the minimum and the parent covering one

Leaf nodes left

unchanged

The R++-tree idea

Page 8: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Nodes of R++-tree Leaf nodes

Exactly same as leaf nodes of R+-tree Contain Id and coordinates for each object Take one disk page each

Inner nodes Contain pointer and two rectangles for each

child node Take two disk pages each

Page 9: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Using of two rectangles in inner nodes Searching

Only the minimum bounding rectangles are necessary

Inserting new objects Both minimum bounding and parent covering

rectangles need to be used (read/updated)

Page 10: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Implementation of inner nodes First page contains minimum bounding

rectangles Second page contains parent covering

rectangles

Page 11: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Advantages and drawbacks of two pages idea Advantages

searching requires reading of one page per each node involved

rate between page size and node capacity is the same as in R+-tree

Drawbacks When updating, two pages per inner node need to

be processed The real impact on whole index size is relatively low

Page 12: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Experiments - data Artificial data (range, kNN and top-k query)

100 000 random points of 2–10-dimensional space decimal values within [0; 1] Integer values from 1 to 100 Integer values from 1 to 10

Pseudo-real data (top-k query) 6 dimensional points – data of flats for sale

550 000 flats (20-multiple set) 2 700 000 flats (100-multiple set)

Page 13: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Experiments - measures 300 random queries per each data set and

query type

Average time per query

Average number of I/Os per query One I/O corresponds to reading of one page i.e.

processing one node

Page 14: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with decimal values within [0; 1]

Page 15: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with decimal values within [0; 1]

Page 16: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with decimal values within [0; 1]

Page 17: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with integer values from 1 to 100

Page 18: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with integer values from 1 to 100

Page 19: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with integer values from 1 to 100

Page 20: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with integer values from 1 to 10

Page 21: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with integer values from 1 to 10

Page 22: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Artificial data100 000 random points with integer values from 1 to 10

Page 23: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Pseudo-real data550 000 flats (i.e. 6-dimensional points)

Page 24: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Pseudo-real data550 000 flats (i.e. 6-dimensional points)

Page 25: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Pseudo-real data2 700 000 flats (i.e. 6-dimensional points)

Page 26: R ++ -tree :  an  efficient spatial access method for highly redundant point data

Martin Šumák, Peter Gurský at ADBIS 2013

Thank you for your attention