Trees for spatial indexing Part 2 : SAMs. SAMs R-Tree XTV R*-Tree.

Trees for spatial indexing

Part 2 : SAMs

SAMs

R-Tree

X TV

R*-Tree

Answering question

• The Kd-Trie, is similar to kd-tree. In the article it was used for kd-tree.

• The split-axis isn’t in the middle, but is choosen is the median point.

• Because, we work with points, we have no problem is separating the elements.

UB-Tree range queries

• Algorithm is :

• Find all region who intersects q– IF this region is a page, all objects that

intersects q is in the answer.– After that we search for the last subcube in

this region and we search the brother, and if it intersects q we make the same loop on it.

– After that we look the father of B and search again.

R-Tree

• Special B+-Tree for spatial indexing.

• The performance of the R*-Tree is decreasing with the dimensionality.

• R-tree access method is prohibitively slow for dimensions higher than 5.

Problems of (R-Tree based) Index Structures

• Because it has been shown that with the increasing of the dimensionality we have also more overlap.

• Overlap is intuitively when for some point queries, we have multiple paths to search.

Definition of overlap

• Intuitively, overlap is the pourcentage of the volume that is covered by more than one directory hyperrectangle.

• This intuitive definition of overlap is directly correlated to the query performance.

• Because it implies multiple paths.

Definition of the overlap (2)

• Overlap = ||( Ui,j, i≠j Ri ∩ Rj )|| / ||( Ui Ri )||

• We add all the intersection of the MBR in volume and we divide it by the union of all the MBR in volume.

• But overlap in highly populated areas is much more critical than overlap in low population.

• WeightedOverlap = |{ p|p Ui,j,i≠j Ri ∩ Rj )}| / |(p|p Ui Ri )|

1

1

Overlap = (¼)/(2) = 1/8 = 12,5 % WeightedOverlap = (2)/(6) = 1/3 = 33 %

Overlap / WeightedOverlap

• Depending the kind of data the the measurement can be different.

• If we have uniformed distributed data points, we can use the overlap measure

• In the case of real data, when can have clustering, so the weightedOverlap is more accurate.

X-Tree

• Avoid overlap in the directory.

• X-Tree hybrid of a linear array-like and a hierarchical R-Tree-like directory.

• In low dimensions the most efficient organization of the directory is hierarchical organization.

• For high dimensionality a linear organization is more efficient.

X-Tree

• In the X-Tree we have 3 types of nodes : data nodes,normal directory, and supernodes.

• The supernodes avoid splits in directory, so it’s more faster to search.

• Not the same as R*-Tree with larger blocks, because it creates larger blocks only if necessary.

X-Tree

Supernode

Normal directory

Data nodes

Creation of supernodes

• They are only created if there is no other possibility to avoid overlap during insertion.

TV-Tree (Telescopic-Vector tree)

• The basis of the tv-tree is to use dynamically contracting and extending feature vectors. ( Like in classification )

TV-Tree

• A m-contraction of x, is a sequence of

• Amx where Am is a contraction matrix.

• A natural Am is

• ( 1 0 … 0 )( 0 1 0 … 0 )( …. )( 0 …. 0 1)

Multiple shapes• We can use for example a sphere,

because it’s only a center and a radius r. Represents the set of points with euclidean distance ≤ r.

• ~the euclidean distance is a special case of the Lp metrics with p=2.

• For L1 metric (manhattan distance) it defines a diamond shape.

• The TV-tree is working with any Lp-sphere.

Tv-Tree principle

• So the TV treats the attributs asymmetrically favoring the first few features over the rest.

• TV-Tree can use any type of MBR (minimum bounding region), rectangle,cube,sphere etc.

• TV-Tree can use any Lp-Sphere

TV-Tree node structure

• Each node is represents the MBR of all it’s descendents ( say an Lp-sphere ).

• Each region is represented by a center which is a telescopic-vector and a radius.

• So we talk about TMBR.

TV-1-Tree example

TV-2-Tree example

TMBR

Act. Dim : y

Act. Dim : x

Act. Dim : z Act. Dim : x,z

Act. Dim : x,y

What is the best number of active dimensions ?

• They find out that the best number of active dimensions was two

TV-Tree conclusion

• We accept overlap, so also multiple path to search.

• Branch choosen for new point is done with the following criteria :

Trees for spatial indexing Part 2 : SAMs. SAMs R-Tree XTV R*-Tree.

Documents

Transcript of Trees for spatial indexing Part 2 : SAMs. SAMs R-Tree XTV R*-Tree.