Geodatabases - Software engineering · 2018. 2. 8. · number of users. On the basis of following...
Transcript of Geodatabases - Software engineering · 2018. 2. 8. · number of users. On the basis of following...
1
Chapter 1
Geodatabases
Shubham Bansal
This seminar introduces the basic of Geodatabases and ex-
plains the overall structure with some examples. Briefly,
It also discuss briefly about the different types of Geo-
databases, their architectures and their advantages/ disad-
vantages.
Later on in this paper you can find out the overall tech-
nical details about the different data types used for Geo-
databases, different operations to be performed in order to
process and analyze geographical data, data structure be-
hind it and different algorithms used for indexing of Geo-
databases.
In the last, there are some SQL queries that deal with geo-
graphical data of Geodatabases.
The motivation behind this paper is to get the clear picture
of the Geodatabases and how it manages the spatial data
internally. From the name Geodatabases looks very simple,
but the biggest challenge is, how it really supports the stor-
2 1 Geodatabases
age and do management of the geographical data. Here in
this paper I am explaining some of the topics that are in-
ternally related to Geodatabases, its data types, its datasets
(What all kinds of data it supports), its relation with Re-
lational Database Management System (RDBMS) and how
the indexed algorithm modified to fairly deal with queries
relates to geographical data that is stored in Geodatabases.
This paper also covers some of the aspects of usage, appli-
cations and different fields where Geodatabases is playing
a great role and also the names of the few organizations,
who are the key players in terms of GIS + Geodatabases
systems.
1.1 Definition of Geodatabases
Geodatabase is a part of Geographical Information System
(GIS) which helps to store and manipulate geographical
data. Geodatabase is a combination of two words Geo +
Database [1].
Geo refers to spatial data, which means data that identi-
fies the location and boundaries on earth. In other words
spatial data is the data used to store the coordinates and
topology of some location that can be mapped. Later on
this spatial data is processed and analyzed with GIS sys-
tems [2].
Geodatabases are the underlying data structure of any GIS
system and used for editing and data management tasks. A
wide range of database management systems (DBMS) and
normal file systems are the base for Geodatabases. These
DBMS and file systems come in many sizes and supports a
different number of users [3].
1.2 Generation of Geodatabases 3
1.2 Generation of Geodatabases
Following are the different genration of Geodatabases:
First Generation:
• In the first generation of Geodatabases the spatial
data is stored outside the DBMS system in separate
files.
• Each file is mapped with the unique Id’s as shown in
Figure 1.1.
Figure 1.1: 1st Generation Geodatabases
Second Generation: In the second generation of Geo-
Figure 1.2: 2nd Generation Geodatabases
databases the spatial data is stored inside the DMBS system
in a separate column called GEOMETRY. Along with this
4 1 Geodatabases
the DBMS engine is enhanced to support the SQL queries
related to spatial data.
Overall we can say spatial data is linked and stored in the
same location in tabular format in DBMS as shown in Fig-
ure 1.2.
1.3 Type of Geodatabases
There are three different categories of Geodatabases,
namely: personal, file and ArcSDE. We will discuss each
category in the following subsection [4] :
1.3.1 Personal Geodatabases
The single user database is also known as personal Geo-
databases. This kind of Geodatabases are using MS Access
to store the spatial information. There are some limitations
applied to this type:
• In case of file storage maximum size bounded to only
2 GB.
• Supported only for windows platform.
1.3.2 File Geodatabases
This type of Geodatabase also supports single user editing.
This kind of Geodatabases are using a normal file structure
to store the spatial information. The file extension should
be ”.mdb” format.
There are some limitations applied to this type:
1.3 Type of Geodatabases 5
• In case of file storage maximum size of 1 TB / Table.
1.3.3 ArcSDE Geodatabases
This database is built on top of RDBMS system (Figure 1.3).
This is also called ArcSDE. This kind of database is used
to provide a multiuser environment via providing central
spatial data storage location. This database has access
control for individual user, backup and recovery option.
These all features make this database more scalable. Even
though it has many advantages over other types but still
got some limitations such as, it’s not platform indepen-
dent, different software versions have different platform
(windows, Linux, Mac) mapping.
Figure 1.3: High Level Architecture
Geodatabases are divided into two categories according to
number of users. On the basis of following advantages and
disadvantages one can select the type:
Single User Geodatabases
Geodatabase which supports only one user at a time are
called Single user Geodatabase. For File Geodatabases
multiple concurrent editors are allowed, but for personal
Geodatabases it is not allowed. This kind of Geodatabases
has limited capacity to store geospatial data [5].
6 1 Geodatabases
Multi User Geodatabases
Multiple users are allowed to work parallely in multi user
geodatabases. Multi-user Geodatabases fits into all kinds
of organizations (Small, Medium, Large) because data Stor-
age capacity is totally depending on the size of the server.
Multiple users are allowed to perform different geospatial
operations concurrently.
Multiple user editing is supported with this type also it
supports all the spatial data types [6].
1.4 Architecture of a Geodatabase
The architecture of Geodatabase is divided into following
four features [7][8]:
• Basic operation of Geodatabase is to physically store
the geographic information using some underlying
DBMS or file system. But in addition to physical
storage of geographic information Geodatabases has
some key aspects.
• Information Model: This model is used for repre-
senting and managing geographic information. This
model is based on a collection of data tables with dif-
ferent geographic datasets (feature class, raster class
and attributes).
• The application layer logic is to access and working
with geographic data with different files and formats.
• And finally a transaction model to manage the GIS
data flow in the GIS system.
1.5 Geodatabases Datasets 7
1.5 Geodatabases Datasets
The Geodatabase has three different kinds of data sets to
manage geographic information. Creating and developing
these above mentioned data sets are the primary need to
design and build a new Geodatabase. Users have to start
first with these datasets designs later on the user can also
add the advance features in the Geodatabase like one can
add topology and network design. The storage of the Geo-
database has both the schema and the set of rules for every
datasets and a table like storage for spatial attribute and
data [9].
1.5.1 Table Basics
Following are the types used to hold and manage informa-
tion about attributes in the Geodatabases [10]:
• Numbers : It holds the numeric values like short inte-
gers, long integers, float and double.
• Text : Collection of alphanumeric values.
• Date : Holds time and date related values.
• BLOBs : This data types stands for Binary large ob-
jects and used to store images.
• Global Identifiers : This is used to manage the rela-
tionship for data management, versioning, updates
and replication. It is a registry style string consist-
ing of 36 characters enclosed in curly brackets and
these strings uniquely identifies a feature or a table
row within and across Geodatabase.
8 1 Geodatabases
1.5.2 Feature class basics
Feature classes hold the homogenous collection of the same
features, where each feature has same spatial representa-
tion like points, lines and polygons with a similar set of
attribute column [11].
For example a point representing some specific location, a
line feature class representing roads lines.
Points, lines, polygons and annotations are the most com-
monly used feature classes used in Geodatabases. Along
with these feature classes Vector features for representing
geographic object with vector geometry are frequently used
for representing discrete boundaries like walls, streets and
rivers.
In simple words we can say that a feature is simply an ob-
ject which holds the graphical representation and which we
typically a collection of point, line and polygon.
Following are the features classes used to hold the graph-
ical representation of an object:
• Points : This class is used to represent features that
are very small (such as a GPS observation).
• Lines : This class is used to represent shape and loca-
tion of geographic objects Eg. street lines and streams,
Or we can use this class to represent those graphical
objects which have length but no area.
• Polygons : This class is used to represent shape and
location of geographic objects Eg. States, countries
and land use zone etc.
• Annotation : This is used to show some descriptive
properties of geographic objects. This class is respon-
sible for text rendering for graphical objects.
1.5 Geodatabases Datasets 9
• Dimensions : This class is used to show the length
and distance of graphical objects. Eg. to indicate dis-
tance between two entities.
• Multi points : This class used to represent the features
which is composed of more than one point.
• Multi patches : This feature class is used to represent
the outer surface of geographical objects that occupy
some area or volume in 3-D space. Representation of
simple objects (triangles and cubes) to complex ob-
jects (Isosurface and buildings).
1.5.3 Raster basics
Raster datasets display geographic elements by dividing
the space into discrete square or rectangular cells in the
grid. Every cell has a value that is used to represent some
characteristic of that location as shown in Figure 1.4.
Raster datasets are commonly used for representing im-
agery, digital models and other different areas. Often
rasters are used as a way to represent points, line and poly-
gon features. In the example below, you can see how a se-
ries of polygons would be represented as a raster dataset.
Rasters are interesting for at least two reasons: first, they
can be used to represent all geographic information (fea-
tures, images and surfaces) and second, they have a rich set
of analytic Geo processing operators. Therefore, in addition
to being a universal data type for holding imagery in GIS,
rasters are also heavily used to represent features enabling
all geographic objects to be used in raster-based modeling
and analysis [12].
10 1 Geodatabases
Figure 1.4: Raster Representation [12]
1.6 Geodatabases storage in Relational
Database
The data storage model behind the Geodatabases is basic
DBMS. This backbone system provides Geodatabase a sim-
ple and effective data model for storing and working with
GIS data. In this data model [13]:
• Data is stored in tabular form.
• The table is formed by multiple rows and each row
has the same number of columns.
• Every column is representing a data type (i.e what
king of value is stored in the column. Like int, long,
float, date, time, char and new data types for spatial
information).
• The relation between tables is used to map rows of
1.6 Geodatabases storage in Relational Database 11
one table to row of another table with a common col-
umn in related tables.
Geodatabase storage includes both database schema and
rules for geographic datasets and tabular storage for spatial
and attribute data. Schema consists of definition, behavior
and integrity rules for every object as shown in Figure 1.5.
On the other hand spatial objects are most commonly
stored as a raster data set or as vector features in tabular
form along with other attributes. feature class can be stored
Figure 1.5: RDBMS Representation
in table format, where each row represents a feature. A col-
umn type SHAPE that holds the geometry or shape of the
corresponding feature. SHAPE column can be of two types.
• BLOB (storing image)
• Spatial column type.
A common set of features with spatial representation like
point, line, polygon and some set of common attributes
each stored in a separate column is referred to as a feature
class which is stored and managed in a single table [14].
12 1 Geodatabases
We can also stored raster data in the form of tables, but due
to big data size a separate block table is maintained. Raster
data is cut into small chunks and each chunk is stored in a
separate row in the table.
Below are some examples of different databases, i.e. how
they store vector and raster data. Most of the databases al-
ready added spatial data storage and their SQL query pro-
cessing support.
Different databases have different column types that hold
the vector and raster geometry:
• Oracle uses their own spatial data types for data types
defined by ArcSDE [14].
• IBM DB2 using the Spatial Extender Geometry Object
[14].
• Informix uses the Spatial DataBlade Geometry Object
[14].
• PostgreSQL uses the ArcSDE Spatial Type (Geome-
try) or PostGIS geometries [14].
• Microsoft SQL Server using Microsoft spatial types,
geometry and geography [14].
1.7 Geodatabase field data types
GEOMETRY a SQL data type used by different Geodatab-
ses storage for Oracel, IBM DB2 and PostgreSQL. This SQL
data type is built by ESRI. GEOMETRY data type is the de-
fault data type for storing object geometry in PostgreSQL
database [15].
Here we will discuss in detail about the Geodatabase data
1.7 Geodatabase field data types 13
type and their subclasses of store spatial information, but
before starting a discussion about data types in detail first
have a small look on the common storage mechanism of
Geodatabase in succeeding sections.
In general spatial data is stored in the form of tables sup-
ported by feature storage model in relational database ta-
bles. Feature table can have multiple rows where each row
holds a feature with the geometry of object is stored in one
column called SHAPE. This SHAPE column holds the poly-
gon, line and point geometry. Feature class is widely used
storage model for Geodatabase because it fits very well
with SQL processing engine. Along with this feature class
has a number of advantages also:
Figure 1.6 shows the datatype hirearchy, supported for ge-
ographical data.
• One column holds the overall geometry of a feature.
• Data structure for physical schema is very fast, scal-
able and simple.
• Easy from programmers point of view to write an in-
terface.
• Interoperability i.e. easy to move data in and out.
Let’s have a look into some important data types and their
supported functions.
POINT [17]
This data type is considered as zero dimensional (0 - D). It
is used to store position of any object in the space and used
to define feature like landmarks, hospitals and any location
specified by the user.
POLYGON [17]
This data type is used to store a sequence of points that
14 1 Geodatabases
Figure 1.6: Spatial Data Types [16]
represents a two dimensional surface. These stored point
define the exterior boundary of the polygon. This is used
to define different parts of lands, water bodies and other
clustered objects.
Following are the functions supported by this datatype.
• Area - This returns the polygon area.
• ExteriorRing.
• NumInteriorRing.
• InteriorRingN.
• Centroid - returns the center of the polygon.
• PointOnSurface - returns the point of the polygon on
request.
LineString
This is used to store a sequence of points that defines a lin-
ear interpolated path. This kind of data type has a length
1.7 Geodatabase field data types 15
attribute. This can be considered as ring if start and end
points are same. This type is used when we have to define
roads, tunnels, rivers and power lines.
Following are the functions supported by this datatype.
• StartPoint
• EndPoint
• PointN
• Length - return double precision value
• NumPoints
• IsRing - return boolean value
• IsClosed - return boolean value
The above discussed data types are very frequently used
data types in the Geodatabase. Other data types are
MultiPoint, MultiLineString and MultiPolygon. These
data types are the advanced version of the above discussed
data types.
SQL query execution examples with spatial data. Below
are some SQL query syntax will work with PostgreSQL
database.
Table creation [17]
create table table name(field 1 integer, field 2 varchar(100),
field 2 float, field 4 geometry);
Spatial data Insertion [17]
Insert into table name values(11, this is just an example,
445.63, ploygon(1 2, 3 4, 5 6, 7 8),1));
16 1 Geodatabases
Query for spatial data [17]
select field 1 from table name where (overlaps(object1,
object2) = ’t’);
All these operations are performed using normal SQL and
some of the functions specially defined for spatial data
types.
I am listing the names and functionality of some of the
functions which support spatial operations in SQL:
Point
This function Point for given x,y and spatial reference id.
Difference
The output of this function is the difference of two geomet-
ric objects.
Intersects
This function returns ’t’ in case of successful intersection
else returns ’f’.
Equals
This function compares two geometries to be either identi-
cal or not.
Contains
This function is used to check wheather any given object is
lying completely inside another object or not.
1.8 Spatial Indexing
Spatial indexing is used to support spatial data selection
and to perform operations related to spatial data (exam-
ple: spatial join and nearest objects). The advantage of us-
ing spatial index is, it organizes object space in such a way
1.8 Spatial Indexing 17
that while querying only a part or a subset of the object is
considered. B-Tree [18], R-Tree [19] are the main used data
structure for spatial indexing and these data structures are
designed to store either points or rectangles [20] [17].
A spatial index structure forms objects in the form of
baskets, where each basket has associates region i.e. a part
of space containing all objects present in that bucket as
shown in Figure 1.7.
Figure 1.7
Spatial Index structure of rectangles
• In case of overlapping (Figure 1.8) bucket region par-
tition region is abandoned and bucket region may
overlap. For example R-tree algorithm. The advan-
tage of this approach is that, spatial object remains in
the single bucket, but on the other hand there exists
multiple search paths due to overlapping bucket re-
gions.
• In case of Clipping bucket regions are disjoint, but a
data rectangle is cut into multiple pieces. For example
R+ trees [20]. The advantage of this approach is to
18 1 Geodatabases
Figure 1.8: Overlapping Example [17]
have less branching during search operation and the
disadvantage of this approach is there exists multiple
entries for the single spatial object.
Figure 1.9: Clipping Example
1.9 Benefit of Geodatabases
Geodatabases help to create a central repository of Geo spa-
tial data that can be easily accessed via mobile, web, desk-
top and provides functionality to apply different operations
and setting up relationship among data.
1.10 Different Area’s for Geodatabases 19
Figure 1.10: Geodatabase Benefits
1.10 Different Area’s for Geodatabases
The field of spatial database research has been an active
area of research for more than two decades. The results
of this research, e.g., spatial multidimensional indexes, are
being used in a number of areas such as [21]:
• Geographic Information Systems (GIS).
• Computer Aided Design (CAD).
• Multi-media Information System (MMIS).
• Data Warehousing(DWH).
• NASAs Earth Observation System (EOS).
• It can be used to store information about physical
world such as geography, planning of urban area and
astronomy.
1.11 Conclusion
This paper states the general overview of Geodatabses, its
need, different application, Geo-relational storage concept,
20 1 Geodatabases
Geo-SQL query processing, data structure behind the Geo-
databases and its benefits.
This paper can also help in learning Geodatabases basic ter-
minology such as how Geodatabases is interlinked with un-
derlying relational database management systems and and
its basic datasets to support Geo-spatial information. Along
with this, one can find few examples of different data types
supported by Geodatabases to store spatial information,
functions which helps to resolve Geo-spatial SQL queries
and some sample SQL statements to create tables , insert
values into a table, and querying Geo-Specific information.
21
Bibliography
[1] http://www.esri.com/news/arcnews/
winter0809articles/the-geodatabase.html.
[2] http://www.coastalwiki.org/coastalwiki/
GIS/#_ref-Cox_1.
[3] http://www.esri.com/software/arcgis/
geodatabase/index.html.
[4] http://webhelp.esri.com/arcgisdesktop/
9.2/index.cfm?topicname=types_of_
geodatabases.
[5] http://www.esri.com/software/arcgis/
geodatabase/single-user-geodatabase.
[6] http://www.esri.com/software/arcgis/
geodatabase/multi-user-geodatabase.
[7] http://webhelp.esri.com/arcgisdesktop/
9.2/index.cfm?TopicName=Architecture_
of_a_geodatabase.
[8] http://www.srnr.arizona.edu/rnr/rnr420/
gdb_architecture.html.
[9] http://webhelp.esri.com/arcgisdesktop/
9.2/index.cfm?TopicName=An_overview_of_
the_Geodatabase.
22 Bibliography
[10] http://webhelp.esri.com/arcgisdesktop/
9.3/index.cfm?TopicName=Table_basics.
[11] http://webhelp.esri.com/arcgisdesktop/
9.3/index.cfm?TopicName=Feature_class_
basics.
[12] http://webhelp.esri.com/arcgisdesktop/
9.3/index.cfm?TopicName=Raster_basics.
[13] http://www.esri.com/software/arcgis/
geodatabase/data-storage.
[14] http://webhelp.esri.com/arcgisdesktop/
9.2/index.cfm?TopicName=Geodatabase_
storage_in_relational_databases.
[15] http://webhelp.esri.com/arcgisdesktop/
9.2/index.cfm?TopicName=Geodatabase_
field_data_types.
[16] http://webhelp.esri.com/arcgisserver/9.
3/dotNet/geodatabases/st_geometry.gif.
[17] http://workshops.opengeo.org/
postgis-spatialdbtips/introduction.html.
[18] http://en.wikipedia.org/wiki/B-tree.
[19] http://en.wikipedia.org/wiki/R-tree.
[20] A. V. Philippe Rigaux, Michel Scholl. Spatial
DataBases with application to GIS.
[21] S. Shekhar. Spatial databases-accomplishments and
research needs. IEEE, 1999.
Typeset August 13, 2012