Trees for spatial indexing Part 2 : SAMs. SAMs R-Tree XTV R*-Tree.
-
Upload
emily-raspberry -
Category
Documents
-
view
225 -
download
2
Transcript of Trees for spatial indexing Part 2 : SAMs. SAMs R-Tree XTV R*-Tree.
Trees for spatial indexing
Part 2 : SAMs
SAMs
R-Tree
X TV
R*-Tree
Answering question
• The Kd-Trie, is similar to kd-tree. In the article it was used for kd-tree.
• The split-axis isn’t in the middle, but is choosen is the median point.
• Because, we work with points, we have no problem is separating the elements.
UB-Tree range queries
• Algorithm is :
• Find all region who intersects q– IF this region is a page, all objects that
intersects q is in the answer.– After that we search for the last subcube in
this region and we search the brother, and if it intersects q we make the same loop on it.
– After that we look the father of B and search again.
R-Tree
• Special B+-Tree for spatial indexing.
• The performance of the R*-Tree is decreasing with the dimensionality.
• R-tree access method is prohibitively slow for dimensions higher than 5.
Problems of (R-Tree based) Index Structures
• Because it has been shown that with the increasing of the dimensionality we have also more overlap.
• Overlap is intuitively when for some point queries, we have multiple paths to search.
Definition of overlap
• Intuitively, overlap is the pourcentage of the volume that is covered by more than one directory hyperrectangle.
• This intuitive definition of overlap is directly correlated to the query performance.
• Because it implies multiple paths.
Definition of the overlap (2)
• Overlap = ||( Ui,j, i≠j Ri ∩ Rj )|| / ||( Ui Ri )||
• We add all the intersection of the MBR in volume and we divide it by the union of all the MBR in volume.
• But overlap in highly populated areas is much more critical than overlap in low population.
• WeightedOverlap = |{ p|p Ui,j,i≠j Ri ∩ Rj )}| / |(p|p Ui Ri )|
1
1
Overlap = (¼)/(2) = 1/8 = 12,5 % WeightedOverlap = (2)/(6) = 1/3 = 33 %
Overlap / WeightedOverlap
• Depending the kind of data the the measurement can be different.
• If we have uniformed distributed data points, we can use the overlap measure
• In the case of real data, when can have clustering, so the weightedOverlap is more accurate.
X-Tree
• Avoid overlap in the directory.
• X-Tree hybrid of a linear array-like and a hierarchical R-Tree-like directory.
• In low dimensions the most efficient organization of the directory is hierarchical organization.
• For high dimensionality a linear organization is more efficient.
X-Tree
• In the X-Tree we have 3 types of nodes : data nodes,normal directory, and supernodes.
• The supernodes avoid splits in directory, so it’s more faster to search.
• Not the same as R*-Tree with larger blocks, because it creates larger blocks only if necessary.
X-Tree
Supernode
Normal directory
Data nodes
Creation of supernodes
• They are only created if there is no other possibility to avoid overlap during insertion.
TV-Tree (Telescopic-Vector tree)
• The basis of the tv-tree is to use dynamically contracting and extending feature vectors. ( Like in classification )
TV-Tree
• A m-contraction of x, is a sequence of
• Amx where Am is a contraction matrix.
• A natural Am is
• ( 1 0 … 0 )( 0 1 0 … 0 )( …. )( 0 …. 0 1)
Multiple shapes• We can use for example a sphere,
because it’s only a center and a radius r. Represents the set of points with euclidean distance ≤ r.
• ~the euclidean distance is a special case of the Lp metrics with p=2.
• For L1 metric (manhattan distance) it defines a diamond shape.
• The TV-tree is working with any Lp-sphere.
Tv-Tree principle
• So the TV treats the attributs asymmetrically favoring the first few features over the rest.
• TV-Tree can use any type of MBR (minimum bounding region), rectangle,cube,sphere etc.
• TV-Tree can use any Lp-Sphere
TV-Tree node structure
• Each node is represents the MBR of all it’s descendents ( say an Lp-sphere ).
• Each region is represented by a center which is a telescopic-vector and a radius.
• So we talk about TMBR.
TV-1-Tree example
TV-2-Tree example
TMBR
Act. Dim : y
Act. Dim : x
Act. Dim : z Act. Dim : x,z
Act. Dim : x,y
What is the best number of active dimensions ?
• They find out that the best number of active dimensions was two
TV-Tree conclusion
• We accept overlap, so also multiple path to search.
• Branch choosen for new point is done with the following criteria :