Graduate Course Spatial Data

KUT

Graduate CourseSpatial Data

한국기술대학교민준기

KUT

Spatial Data

• Traditional Data– Single Dimension– value, text

• New Application– GIS,– CAD– LBS– Multimedia Data

– Multi-dimensional Data

KUT

Spatial Access Method(SAM)

• Support efficient access of Spatial Data

• B-Tree– Only one dimensional Data– Not appropriate to multi-dimensional Data

• One of famous spatial indexes– R-Tree

KUT

R-Trees : A Dynamic Index Structure for Spatial Searching

• R-Tree– A Height-balanced Tree with index records in its

leaf nodes containing pointers to data objects.– Dynamic structure: inserts and deletes can be

intermixed with searches and no periodic reorganization is required.

KUT

R-Trees : A Dynamic Index Structure for Spatial Searching

• R-Tree– It is difficult to handle pure spatial data– Based On MBR (minimum bounding rectangle)

approximation

A1 A2

R1

a3 a4a1 a2

A1

A2

a1

a2

a3

a4

KUT

R-Tree Structure

• Node = (E1,… ,EM)

• Ei = (I, pointer) where I = (I0,..,Id) , d is dimension and Ij = [a,b]

• Let M be the maximum number of entries, and m <= M/2 be the minimum number of entries of a node

KUT

Property of R-tree• Every leaf Node contains between m and M index

record unless it is the root.• For each index record (I, pointer) in a leaf node, I is

the smallest rectangle that spatially contains the n-dimensional data object represented by the indicated tuple.

• Every non-leaf node has between m and M children unless it is the root.

• For each entry (I, pointer) in a non-leaf node, I is the smallest rectangle that spatially contains the rectangles in the child node.

• The root node has at least two children unless it is a leaf.

• All leaves appear on the same level.

KUT

Property of R-Tree

• The height of an R-Tree containing N index records is at most [log_mN]-1– The maximum number of nodes is

[N/m]+[N/m^2]+...+1

– Worst case space utilization for all nodes except root node is m/M.

#of leaf nodes

KUT

R-Tree Search

• Due to the overlap of MBRs, many index nodes may be visited.

Search(MBR)

if(leaf node){

check all entries in this node which overlap MBR

}else{

for each childnode nx which overlap MBR

nx.seach(MBR)

}

KUT

R-Tree Insertion

• Algorithm Insertion (newMBR)– Find position for new record

• ChooseLeaf Call to select a leaf node

– Add record to leaf node• If full, SplitNode call

– Propagate changes upward• AdjustTree

– Grow tree taller

KUT

R-Tree Insert

• Algorithm ChooseLeafCL1 Set N to be a rootCL2 If N is a leaf

return N else

Choose the entry in N whose rectangle needs least area enlargement to include the new data. Resolve ties by choosing the entry with the smallest rectangle

CL3 Set N to be the childnode pointed to by the childpointer of the chosen entry.

CL4 Repeat CS2.

KUT

R-Tree Insert

• If there is no room invokes SplitNode– Splite MBR to minize the MBR size

• Optimal SpliteNode -> cases that make two subset with M+1 entries-> O(2M-1)

bad good

KUT

R-Tree Insert

• Approximation (see details)– Quadratic (O(M2))– Linear

• Select two entries whose lengh are fartest• Insert Remains intp groups

KUT

R-Tree Insertion• Adjust covering rectangles and propagating nodes splits as

necessary• Ascend from leaf node L to the rootAdjustTree Algorithm• [Initialize] N = L• [Check if done] if N is root, stop• [Adjust covering rectangle in parent entry]

– Let P be the parent of N, E_N be N’s entry of P– Modify E_N MBR to enclose all MBRS in N.

• [Propagate node split upward]– If N has a partnet NN resulting from an earlier split, – Create a new entry E_NN and add E_NN to P– If P has no room, invoke SplitNode

• [Move up to next node]– Set N= P and NN= PP, goto step 2.

KUT

Processing and Optimization of Multiway Spatial Joins Using R-trees

• Cost Based Query Optimizer – Join Selectivity

• probability that a tuple is result

– best efficient query execution plan generate

• Spatial Join Selectivity– Multi-dimension attribute

• commonly 2dimension

• In this work, focus computation the cost of filer Step(= consider only MBR)

KUT

Previous Work

• Assumption– [0,1)d

• d-dimensional work space• data is uniformly distributed• each dimension is independent

KUT

Previous Work

• Window Query– find all points include window q

– S(q) =|qi|d

|qi| = size of q of dimension i q

qx

qy

KUT

Previous Work

• 2-Way Join Query– find Ra interset Rb

S(Ra,Rb) = (|Sa|+ |Sb|)d

(where |Si| = average size of Ri on one dimension

d = dimension)

(|Sa,y|+|Sb,y|)

(|Sa,x|+|Sb,x|)

KUT

Previous Work

• M-Way Linear Queries(Acyclic Queries)– Ra intersect Rb and Rb intersect Rc

S(Ra,Rb,Rc) = (|Sa|+ |Sb|)d (|Sb|+ |Sc|)d

– Generalization

∏ (|Si|+|Sj|)d∀i,j:Q(i,j) = TRUE

|Sb||Sa|

|Sc|

KUT

Previous Work

• M-Way Clique Join Query(M≥3)– Papadias, Mamoulis, Theodoridis(ACM PODS99)– Clique: if a set of rectangles mutually intersect,

then they must share a common area

R1 R2

R3

S1S2

S3

Query graph Spatial relationship

KUT

Previous Work

– Common Area(qn)

– Proof(by induction): ||

||||

1 ,1

1

i

n

i

n

ijj

i

n

in

S

Sq

||||

||||||

21

212 ss

ssq

s1s2

s1s1

s2s2

||||

||

21

1

ss

s

||||

||||

21

12

ss

ss

||||

||

21

1

ss

s

2

|| 1s2

|| 1s|s1|

확률 :

대표값 :

KUT

Previous Work

– Selectivity of M-Way Clique Join QueryProb(s2 interset s1)*Prob(s3intersect s1∧s3 intersect s2|s1 s2 mutually intersect) =

Prob(s2 intersect s1)*Prob(s3 intersects common intersection area of s1 s2)

– General Case:

d

d

d sssssssss

ssss |)||||||||||(|||

||||

|||||)||(| 133223

21

2121 1

d

i

n

i

n

ijj

S

||1 ,1

Graduate Course Spatial Data

Documents

Transcript of Graduate Course Spatial Data