Spatial databases
-
Upload
neha-kulkarni -
Category
Engineering
-
view
39 -
download
2
Transcript of Spatial databases
SPATIAL DATABASES
By-Neha KulkarniME Computer
Pune Insitute of Computer Technology
What is spatial data? Types of spatial data Types of queries Applications Indexing Techniques Comparison of Indexing techniques GiST Indexing High-dimensional data Conclusion
Agenda
Spatial Data• Spatial data represent the location ,size
and shape of an object on earth• Ex. Building, lake
Point data: Line data: polygon data:
Representation of Spatial Data
Point Data: Simplest form of representing spatial data No space and has no associated area or volume Consists of collection of points Ex. Raster data
Types of Spatial Data
Region data: Has spatial extend with location and boundaries Represented using of points, line, polygons Ex. roads, rivers: line data
1) Spatial range queries: Related with region dataEx. “Find all cities within 50 miles of Pune”
2)Nearest Neighbor queries:Related with point dataEx. “Find 10 cities nearest to Pune”In ordered citiesUse in multimedia database
Types Of Queries
3) Spatial Join Queries:- use both point and region data.- Ex. “Find pairs of cities within 200 miles of each other AND “ Find all cities near a lake”- More complex- Expensive to evaluate
1) Geographic Information System(GIS) Ex. MAP2) Computer Aided Design/Manufacturing(CAD/CAM) Ex. Surface of design object Range and Spatial join queries used3) Multimedia Database video, audio, image, text also required spatial data Nearest neighbor queries and point data
Applications of Spatial Data
Point Data: Grid files, ḥE trees, Kdtrees, point quad trees Region data: Quad trees, R trees, SKD trees, -Yet no best indexing technique- R trees are commonly used : due to simplicity, ability to handle both data performance to complex queries
Indexing Spatial Data
Three main indexing techniques :
Region Quad-Trees and Z-Ordering – handle both point and region data
Grid Files – only point data R-Trees – handle both point and region data
Indexing techniques
Z-ordering gives us a way to group points according to spatial proximity.
Consider X-01 and Y-11 Z-value is 0111 by interleaving X and Y values. This gives us the value for the point 7.
Region Quad-Trees and Z-Ordering
Space filling curves
The Region Quad tree structure corresponds directly to the recursive decomposition of the data space.
Each node in the tree corresponds to a square-shaped region of the data space.
Grid files rely upon a grid directory to identify the data page containing a desired point.
The Grid file partitions space into rectangular regions using lines that are parallel to the axes.
If the X axis is cut into i segments and the Y axis is cut into j segments, we have a total of i x j partitions. The grid directory is an i by j array with one entry per partition.
This description is maintained in an array called a linear scale; there is one linear scale per axis.
Grid Files
Searching for a point in a grid file
Inserting points in a Grid File
Adaptation of B+ Tree Height-balanced data structure Search key values are referred to as
Bounding Boxes A data entry consists of a pair (n-
dimensional box, Rid) Rid – object Identifier N-dimensional box is the smallest box that
contains the object
R-Tree
An example R-Tree
Search for Objects Overlapping Box Q
Start at root.1. If current node is non-leaf, for each entry <E, ptr>, if box E overlaps Q, search subtree identified by ptr.2. If current node is leaf, for each entry <E, rid>, if E overlaps Q, rid identifies an object that might overlap Q.
Searching in a R-Tree
Insert Entry <B, ptr> Start at root and go down to “best-fit” leaf L. Go to child whose box needs least enlargement to cover B; resolve ties by going to smallest area child. If best-fit leaf L has space, insert entry and stop. Otherwise, split L into L1 and L2. Adjust entry for L in its parent so that the box now covers (only) L1. Add an entry (in the parent node of L) for L2. (This could cause the parent node to recursively split.)
Insertion in a R-Tree
Region Quad Trees
Grid Files(point data)
R-Trees
Range Queries Easily handled
Easily handled for point data.
Handled by calculating bounding box
Nearest Neighbour Queries
Can be handled.Sometimes tricky due to long diagonal jumps
Easily handled for point data.
Handled well by traversing for the point or region
Spatial Joins Can be handled with some extension to range queries
Easily handled for point data.
Handled very well
Comparison between indexing techniques wrt. queries
The Generalized Search Tree (GiST) abstracts the “tree” nature of a class of indexes including B+ trees and R-tree variants. Striking similarities in insert/delete/search and even concurrency control algorithms make it possible to provide “templates” for these algorithms that can be customized to obtain the many different tree index structures. GiST provides an alternative for implementing other tree indexes in an ORDBMS.
GiST
Typically, high-dimensional datasets are collections of points, not regions. E.g., Feature vectors in multimedia applications. Very sparse Nearest neighbor queries are common. R-tree becomes worse than sequential scan for most datasets with more than a dozen dimensions. As dimensionality increases contrast (ratio of distances between nearest and farthest points) usually decreases; “nearest neighbor” is not meaningful.
Indexing High-dimensional data
Spatial data management has many applications, including GIS, CAD/CAM, multimedia indexing, Point and region data
R-tree approach is widely used in GIS systems
Used in spatial data mining approaches. Popular SDBMS : MySQL(geometry
datatype), Neo4j, AllegroGraph, SpaceBase, CouchDB, PostGreSQL, SpatialDB
Conclusion
“Database Management Systems” by Raghu Ramakrishnan, 3rd Edition
www.techopedia.com/definition dna.fernuni-hagen.de/IntroSpatialDBMS www.geol-amu.org/notes
References
Thank you!!