Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees)...

44
Spatial Indexing SAMs
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    226
  • download

    1

Transcript of Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees)...

Page 1: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Spatial Indexing

SAMs

Page 2: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Spatial Access Methods PAMs

Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree

R-tree Variations: R*-tree, Hilbert R-tree

Page 3: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-tree

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4

P1 P2 P3 P4

F GD E

H I JA B C

Multi-way external memory structure, indexes MBRsDynamic structure

Page 4: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-tree: properties Main points:

every parent node completely covers its ‘children’

a child MBR may be covered by more than one parent - it is stored under ONLY ONE of them. (ie., no need for dup. elim.)

a point query may follow multiple branches.

Page 5: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-tree The original R-tree tries to

minimize the area of each enclosing rectangle in the index nodes.

Is there any other property that can be optimized?

R*-tree Yes!

Page 6: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R*-tree Optimization Criteria:

(O1) Area covered by an index MBR (O2) Overlap between directory MBRs (O3) Margin of a directory rectangle (O4)Storage utilization

Sometimes it is impossible to optimize all the above criteria at the same time!

Page 7: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R*-tree

ChooseSubtree: If next node is a leaf node, choose the

node using the following criteria: Least overlap enlargement Least area enlargement Smaller area

Else Least area enlargement Smaller area

Page 8: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R*-tree SplitNode

Choose the axis to split Choose the two groups along the chosen axis

ChooseSplitAxis Along each axis, sort rectangles and break

them into two groups (M-2m+2 possible ways where one group contains at least m rectangles). Compute the sum S of all margin-values of each pair of groups. Choose the one that minimizes S

ChooseSplitIndex Along the chosen axis, choose the grouping

that gives the minimum overlap-value

Page 9: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R*-tree Forced Reinsert:

defer splits, by forced-reinsert, i.e.: instead of splitting, temporarily delete some entries, shrink overflowing MBR, and re-insert those entries

Which ones to re-insert? How many? A: 30%

Page 10: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-tree: variations What about static datasets?

(no ins/del) Hilbert What about other bounding

shapes?

Page 11: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations

what about static datasets (no ins/del/upd)?

Q: Best way to pack points?

Page 12: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations what about static datasets (no

ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’

Page 13: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations what about static datasets (no

ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; bad for ‘y’

Page 14: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations what about static datasets (no

ins/del/upd)? Q: Best way to pack points? A1: plane-sweep great for queries on ‘x’; terrible for ‘y’ Q: how to improve?

Page 15: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations A: plane-sweep on HILBERT curve!

Page 16: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations

A: plane-sweep on HILBERT curve! In fact, it can be made dynamic

(how?), as well as to handle regions (how?)

Page 17: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations Dynamic (‘Hilbert

R-tree): each point has an

‘h’-value (hilbert value)

insertions: like a B-tree on the h-value

but also store MBR, for searches

Page 18: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Hilbert R-tree

Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

Page 19: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

~B-tree

Page 20: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations Data structure of a node?

LHV x-low, ylowx-high, y-high

ptr

h-value >= LHV &MBRs: inside parent MBR

~ R-tree

Page 21: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations What if we have regions, instead of

points? I.e., how to impose a linear ordering

(‘h-value’) on rectangles?

Page 22: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations What if we have regions, instead of points? I.e., how to impose a linear ordering (‘h-

value’) on rectangles? A1: h-value of center A2: h-value of 4-d point (center, x-radius, y-radius) A3: ...

Page 23: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations What if we have regions, instead of points? I.e., how to impose a linear ordering (‘h-

value’) on rectangles? A1: h-value of center A2: h-value of 4-d point (center, x-radius, y-radius) A3: ...

Page 24: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations with h-values, we can have deferred

splits, 2-to-3 splits (3-to-4, etc) Instead of splitting a full node, find the

siblings (using the h-values) and redistribute the rectangles among the nodes. Split only when all siblings are full.

experimentally: faster than R*-trees(reference: [Kamel Faloutsos vldb 94])

Page 25: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations what about other bounding shapes?

(and why?) A1: arbitrary-orientation lines (cell-

tree, [Guenther] A2: P-trees (polygon trees) (MB

polygon: 0, 90, 45, 135 degree lines)

Page 26: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - variations A3: L-shapes; holes (hB-tree) A4: TV-trees [Lin+, VLDB-Journal

1994] A5: SR-trees [Katayama+,

SIGMOD97] (used in Informedia)

Page 27: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - conclusions Popular method; like multi-d B-

trees guaranteed utilization good search times (for low-dim. at

least) Informix ships DataBlade with R-

trees

Page 28: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Spatial Queries

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 29: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Spatial Queries

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 30: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Spatial Queries

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 31: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Spatial Queries

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 32: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

Spatial Queries

Given a collection of geometric objects (points, lines, polygons, ...)

organize them on disk, to answer point queries range queries k-nn queries spatial joins (‘all pairs’ queries)

Page 33: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - Range search

pseudocode: check the root for each branch, if its MBR intersects the query rectangle apply range-search (or print out, if this is a leaf)

Page 34: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

Page 35: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search Q: How? (find near neighbor;

refine...)

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

Page 36: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search A1: depth-first search; then, range

query

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

Page 37: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search A1: depth-first search; then, range

query

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

Page 38: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search A1: depth-first search; then, range

query

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

Page 39: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search A2: [Roussopoulos+, sigmod95]:

priority queue, with promising MBRs, and their best and worst-case distance

main idea: Every face of any MBR contains at least one point of an actual spatial object!

Page 40: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search

A

B

C

DE

FG

H

I

J

P1

P2

P3

P4q

consider only P2 and P4, for illustration

Page 41: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search

DE

H

J

P2P4q

worst of P2

best of P4=> P4 is useless

for 1-nn

Page 42: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search

DE

P2q

worst of P2

what is really the worst of, say, P2?

Page 43: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search

P2q

what is really the worst of, say, P2? A: the smallest of the two red

segments!

Page 44: Spatial Indexing SAMs. Spatial Access Methods PAMs Grid File kd-tree based (LSD-, hB- trees) Z-ordering + B+-tree R-tree Variations: R*-tree, Hilbert.

R-trees - NN search variations: [Hjaltason & Samet]

incremental nn: build a priority queue scan enough of the tree, to make sure you

have the k nn to find the (k+1)-th, check the queue, and

scan some more of the tree ‘optimal’ (but, may need too much

memory)