CONCEPTUAL MODEL OF SPATIAL INFORMATION - …jkmandal.com/pdf/gis_data_model.pdf · spatial data...

GIS DATA MODELS CONCEPTUAL MODEL OF SPATIAL INFORMATION 1. Introduction GIS does not store a map in any conventional sense, nor it stores a particular image or view of geographic area. Instead, a GIS stores the data from which we can draw a desired view to suit a particular purpose known as geographic data. There are two types of data in GIS; spatial data and non-spatial data (Attribute data). Non-spatial data include information about the features. For example, name of roads, schools, forests etc., population or census data for the region concerned etc. Non-spatial or attribute data is that qualifies the spatial data. It describes some aspects of the spatial data, not specified by its geometry alone. A geographical information system essentially integrates the above two types of data and allows user to derive new data for planning. Spatial models are important in that way in which information is represented affects the type of analysis that can be performed and the type of graphic display that can be performed and the type of analysis that can be obtained. In GIS systems there is a major distinction between what are usually referred to as vector GIS and raster GIS. These two approaches to spatial data processing, often to be found in the same GIS package, reflect two different methods of spatial modeling: the former focusing on discrete objects that are to be described, and the latter concerned primarily with recording what is to be found at a predetermined set of locations that may be grid cells or points. 2. Spatial Information Spatial characteristics of information can be broadly distinguished between those that describe where things are, using locations consisting of reference positions, spatial units and spatial relationships; those that describe the form of phenomena, using qualitative and quantitative descriptions of shape and structure; and those that describe associations and interactions between different phenomena.

All geographical data can be reduced to three basic geographical phenomenon can in principle be represented by a point, line or area plus a label saying what it is. So an oil well could be represented by a point entity consisting of a XY coordinate; a road could be represented by a series of XY coordinates; a floodplain could be represented by an area entity covering a set of XY co-ordinates plus the label ‘floodplain’. The labels could be the actual names as given here, or they could be special symbols.

2

The essential features of any data storage system are that they should be able to allow data to be accessed-and cross-referenced quickly. There are several ways of achieving this, some of which are more efficient than others. Unfortunately, there seems to be no one 'best' method that can be used for all situations. This explains in part the massive investment in labour and money in effective database management systems, which are the computer programs that control data input, output, storage, and retrieval from a digital database.

Figure 1: Real world phenomena represented as three basic entities

3. Layers and Coverages The Common requirement to access data on the basis of one or more classes has resulted in several GIS employing organizational schemes, in which all data of a particular level of classification, such as roads, rivers or vegetation types, are grouped into layers or coverages (refer Figure 3.). GIS organize spatial data into layers. Typical layers represent information belonging to particular classes. The layers can be combined with each other in various ways to create new layers that are a function of the individual ones. Any layer does not contain any areal regions that are overlapping, therefore it is possible for each region to have multiple attributes corresponding to multiple perspectives on the meaning of that region.

Figure 2. Layers and Coverages

3

4. Data Model In order to represent the spatial information and their attributes, a data model – a set of logical definitions or rules for characterizing the geographical data is adopted. The data model represents the linkages between the real world domain of geographical data and the computer and GIS representation of these features. As a result, the data model, not only helps in organizing the real-world geographical features into a systematic storage/retrieval mechanism, but also helps in capturing the user’s perception of these features. The model : a) Structures the data to be amenable to computer storage/retrieval and

manipulation. The data structure is the core of the model and it is based upon this that features of real world are represented. The ability of the data structure to totally represent the real world determines the success of the model.

b) Abstracts the real world into properties, which is perceived by a specific

application. For example, a Landuse map is perceived to be made up of different classes with symbols and legends. The district information is perceived to be made up of district maps and different attribute tables.

c) Helps organize a systematic file structure, which is the internal organization

of real world data in a computer. 5. Conceptual Models of Spatial Information There are different models, which have influenced the way in which data are organized and processed within GIS. They are based respectively on objects, networks and fields. Object-based model: Object-based spatial models emphasize individual phenomena that are to be studied in isolation or in terms of their relationships with other phenomena. Any phenomena however big or small may be designated as an object, provided that it can be separated conceptually from neighboring phenomena (refer figure 4). Objects may be composed from other objects and they may have specific relationships with other separate objects. An object-based view is appropriate, though not confined, to phenomena that have well defined boundaries. Hence it is suited to human made phenomena, such as buildings, roads, utilities and administrative regions. Some natural phenomena, such

4

as lakes, rivers, islands and forests, are often represented in object based models because they need to be treated as discrete phenomena for some purposes.

Figure 3. The Object based Conceptual View

Networks Model: Network based spatial models share some aspects of the object based model in that they often deal with discrete phenomena, but the essential characteristic is the need to consider interactions between multiple objects, often along discrete paths or routes that connect them. The exact shape of the phenomena may not be of much importance. What is more important is some measure of distance, or the impedance, between specified phenomena. Typical examples of applications for which a network model is appropriate are studies of traffic on road sea and air route; and analysis of flow of water, gas and electricity through pipelines and cables (refer figure 4). Fields Model: The field based view is appropriate for modeling phenomena that are regarded as continuously variable across some region of space. Examples of phenomena that may be treated as fields include the concentration of pollutants in the atmosphere, temperature of the ground surface, the moisture levels in soil, and the speed and direction of flow in bodies of air and water. Fields may represent either two or three spatial dimensions, depending on the application. A field based spatial model is often adopted when the data to be modeled are not known in sufficient detail to provide precise boundaries, even though at some resolution they could be said to exist (refer figure 4).

6

Figure 4. Conceptual models of spatial Data

6. Spatial Data models for GIS: Representation in computer

When Geographical data are entered into a computer the user will be most at ease if the geographical information system can accept the phenomenological data structures that he has always been accustomed to using. But computers are not organized like human minds and must be programmed to represent phenomenological structures appropriately. Moreover, the way the geographical data are visualized by the user is frequently, not the most efficient way to structure the computer database. Finally, the data have to be written and stored on magnetic devices that need to be addressed in a specific way. Geographical information systems provide methods for representing spatial data that allow the user to adopt conceptual models resembling to a large extent the three classes of model as discussed above. There are two broad categories of spatial data models. These are vector data model and raster data models. The data base concept is central to a GIS and is the main difference between a GIS and drafting or computer mapping systems, which can produce only good graphic output. All contemporary Geographic Information system incorporates a data base management system. Data base systems provide the means of storing a wide range of geographic information and Updating it without the need to rewrite program. In GIS, the spatial data models handle where the features are and Non-spatial data models or Data base management system handle the feature description and how each feature is related to other. Two approaches or models have been widely adopted for representing the spatial data within GIS ; The cartographic map model and the geo-relational model. Each of these approaches is based on a specific spatial data model. The Composite Map Model is usually based on a tessellated (raster) representation of space and the Geo-relational model is usually associated with a vector representation of space. Vector data model represents phenomena in terms of the spatial primitives, or components, consisting of point, line, polygon, surfaces and volumes. Raster data model represents phenomena as occupying the cells of a predefined, grid shaped tessellation. 6.1 Raster Spatial Data Model

7

Typically the grid-cell tessellation, widely known as raster structure is the most commonly adopted structure in a GIS package. The quadtree tessellation is another method, which has been adopted by many GIS packages. Raster based spatial models regard space as a tessellation of cells, each of which is associated with a record of the classification or identity of the phenomena that occupies it. The raster model represents the two- dimensional location of phenomena as a matrix of grid cells. Each cell stores a data item defining the entity, the class or the value of the represented phenomena. Each cell is also known as a ‘pixel’. Raster models also use layered approach. In each layer raster cells represent the presence or absence of phenomena of particular class. The value of particular cell indicates the categories of phenomena within the given class.

Since the cells are of a fixed size and location, rasters tend to represent natural and human made phenomena in a blocky fashion. The extent to which the distribution of phenomena may be mis-represented depends upon the size of the cells relative to the feature of interest. If the cells are sufficiently small relative to the feature, the raster may be a particularly effective means of representing the often somewhat random distribution of the boundaries of natural phenomena that may tend to merge gradually into each other, rather than being neatly delineated. The Grid-cell or raster model is a relatively simple approach to data representation, both conceptually and operationally, and it has therefore, been popular since the earliest days of GIS development. The model is currently implemented in a large number of raster-based GIS packages. Raster data performs a discretisation of the geometric area of interest and the entire space is broken into grid cells of a fixed or uniform size. In this type of representation of geographic data, a set of cells located by coordinates is used, each cell is independently addressed with the value of an attribute by specifying values to each grid cell.

The simplest raster data structures consist of an array of grid cells (refer figure 5). Each grid cell is referenced by a row and column number and it contains a number representing the type or value of the attribute being mapped. In raster structures a point is represented by a single grid cell; a line by a number of neighbouring cells strung out in given direction and an area by an agglomeration of neighbouring cells.

8

Figure 5: Raster data model

Rasters are limited by the area they can represent and also the limits of storage space. Also the fineness of data is limited by the cell size, thus the area of coverage is traded off with the resolution of the coverage. The storage problems are handled by resorting to coding, such as run-length ending, chain coding, block coding etc. The capabilities of a raster-based Cartographic modeling system ultimately arise from functions associated with individual data-transforming operations and the way in which these operations are combined. This transformation of data is facilitated by the fact that map layer zones are represented not by lines or symbols, but by numerical values. It is also facilitated by the fact that these values are directly associated with individual locations. The use of numbers here makes it possible to transform geographical characteristics using mathematical and arithmetical function. Also, it is very easy to carry out overlay operations to compare attributes recorded in different layers. Each attribute associated with the grid cell can be combined logically or arithmetically with attributes in corresponding cells of the other layers to create a new attribute value for the resulting overlay.

The limitations of raster approach lie in that, if each cell is confined to a single classification, the raster model may still fail to represent adequately the transitional nature of change of some natural phenomena. Unless sampling is reduced to a microscopic level, many classes of data, such as soils, sediments and vegetation, are in fact mixtures of categories. Such fuzzy characteristics can be

9

represented more effectively in a raster by means of mixed pixels, in which the component categories are represented by measured or expected percentages of the total composition of the cell.

6.2 Compact methods for storing raster data

When each cell has a unique value it takes a total of n rows x m columns x 3 values (X, Y coordinates and attribute value) to encode each overlay. This is the situation with the altitude matrices used for digital terrain models.

Figure 6: A simple region on rasterized map

6.2.1 Chain codes The boundary of region can be given in terms of its origin and a sequence of unit vectors in the cardinal directions. These directions can be numbered (east 0, north 1, west 2, south 3). For example , if we start at cell row = 10, column = 1, the boundary of the region (figure 6) is coded clockwise by 0,1,0,0,3,0,0,1,0,3,0,1,0,0,0,3,3,2,3,3,3,0,0,1,0,0,0,0,0,3,3,2,2,3,2,2,2,3,1,2,2,1,2,2,1,2,2,1,2,2,1,1,1 Chain codes provide a very compact way of storing a region representation and they allow certain operations such as estimation of areas and perimeters, or detection of

10

sharp turns and concavities to be carried out easily. On the other hand, overlay operations such as union and intersection are difficult to perform with out returning to a full grid representation. Another disadvantage is the redundancy introduced because all boundaries between regions must be stored twice. 6.2.2 Run-length codes. Run-length codes allow the points in each mapping unit to be stored per row in terms, from left to right, of a begin cell and an end cell. For the area shown in Figure the codes would be as follows: Row 9 2,3 6,6 8,10 Row 10 1,10 Row 11 1,9 Row 12 1,9 Row 13 3,9 12,16 Row 14 5,16 Row 15 7,14 Row 16 9,11

In this example, the 69 cells of Fig. 2.11 have been completely coded by 22 numbers, there by effecting a considerable reduction in the space needed to store the data.

Clearly, run-length coding is a considerable improvement in storage requirements over conventional methods whenever the many-to-one relations are present. It is especially suitable for use in small computers and where total volumes of data must be kept limited. On the other hand, too much data compression may lead to increased processing requirements during cartographic processing and manipulation. Run length codes are also useful in reducing the volume of data that need to be input to a simple raster database.

6.2.3 Block codes. The idea of run-length codes can be extended to two dimensions by using square blocks to tile the area to be mapped. Figure. 2.12 shows how this can be done for the raster map of Fig. 2.1 1. The data structure consists of just three numbers, the origin (the centre or bottom left) and radius of each square. This is called a medial axis transformation or MAT (Rosenfeld 1980). The region shown in Fig. 2.11 can be stored by 17 unit squares + 9 4-squares + 1 16square. Given that two coordinates are needed for each square the region can be stored using 57 numbers (54 for coordinates and 3 for cell sizes). Clearly, the larger the square that can be fitted in any given region and the simpler the boundary, the more efficient block coding becomes. Both run length

11

and block codes are clearly most efficient for large simple shapes and least so for small complicated areas that are only a few times larger than the basic cell. Medial axis transformation has advantages or performing union and intersection of regions and for detecting properties such as elongation (Rosenfeld 1980).

6.2.4 Quadtree tessellation A more elegant tessellation is the quadtree in which the geographical area is decomposed into four equal quadrants and the decomposition continues till each quad represents a homogeneous unit. The number of times the decomposition process may be applied, known as the resolution of decomposition, may either be fixed a priori or purely determined by the input data. The storage requirements of a quadtree is much lower than that of a raster having the resolution of the smallest quad element.

Figure 7: Quadtree Tessellation

The tessellation into quad results in a tree with each node represented by 4- sub nodes and thus the name quadtree. Quadtree variants for representing area, lines and points have been designed to represent polygon features, line features and point features, respectively. (refer figure 7) 6.3 Vector Spatial Data model In the second and more intensely developed approach to information integration, attribute information is associated with point, line and polygons – as spatial entities that describe features occurring in the real world. Thus, for example, a point

12

feature such as a city may have associated with it such items of information as its total population, number of houses, number of schools and so on. Similarly, a linear feature such as a river might have associated with it such information as name, mean discharge etc. A polygonal feature such as landuse category might be linked to information describing its use, past land use, its soil type and so forth. The vector representation of an object is an attempt to represent the object as exactly as possible. The co-ordinate space is assumed to be continuous, not quantised as with the raster space, allowing all positions, lengths, and dimensions to be defined precisely. Vector data models treat phenomena as sets of primitives and compose spatial entities. In 2D models the primitive entities are points, lines and areas. The vector data structure represents each geographical feature by a set of coordinates. Vectors as x,y coordinates define points, lines and polygons. The basic premise of the vector based structuring is to define a 2-dimensional space where coordinates on the two-axes represent features. Generally, representing points and lines is straightforward – points are characterized by a x,y coordinate pair and line by a set of x,y coordinate pairs with a specific beginning and ending vector. However, representing polygons in vector storage poses a challenge. The three vector structures used in Geographical Information Systems (GIS) for the storage of points. lines, and areas are: Point entities Point entities can be considered to embrace all geographical and graphical entities that are positioned by a single XY co-ordinate pair. Besides the XY co-ordinates, other data must be stored to indicate what kind of 'point' it is, and the other information associated with it (figure 8) Line entities Line entities can be defined as all linear features built up of straight line segments made up of two or more co-ordinates. The simplest line required the storage of a begin point and an end point (two XY coordinate pairs) plus a possible record indicating the display symbol to be used. An 'arc’, a 'chain' or a 'string' is a set of n XY co-ordinates pairs describing a continuous complex line. the shorter the line segments, and the larger the number of XY co-ordinates pairs, the closer the chain will approximate a complex curve. Data storage space can be saved at the expense of processing time by storing a number that indicates that the display driver routine should fit a mathematical

13

interpolation function to the stored co-ordinates when the line data are sent to the display device. As with 'point' and simple line, chains can be stored with data records indicating the type of display line symbols to be used.

Figure 8: Geometric primitives categorized as point, line and area

Area entities There are several ways of vector structures possible for structuring polygons. The simplest way to represent a polygon is the spagetti representation, which is nothing, but an extension of the simple chain, i.e. to represent each polygon as a set of XY co-ordinates on the boundary i.e. the polygons are discretised to the concept of line representation and are characterized by a set of xy coordinate pairs, but have the same vector as the beginning and ending vector that is representing a self closing line as a polygon. The name of the symbols are used to tell the user what each polygon in ard then held as a set of simple text entities (refer figure 7). 6.4 Topology An important aspect of vector-based models is that they enable individual components to be isolated for the purpose of carrying out measurements of, for example, area and length, and for determining the spatial relationships between the components. Spatial relationships of connectivity and adjacency are examples of topological relationships and a GIS spatial model in which these relationships are explicitly recorded is described as topologically structured. In a fully topologically structured data set, wherever lines or areas cross each other, nodes will be created at the intersections and new areal subdivisions defined. In two dimensions,

14

this may be regarded as part of the process of planar enforcement referred to previously. Vector-based spatial data models that are topologically structured are often described in a terminology of topological objects or primitives, which are classified in terms of topological dimensions, where an n-cell topological object is of n-dimensionality. Two-dimensional topological objects consist of polygons (faces or areas), arcs and nodes. For areal information, adjacency between the areas can be recorded in terms of feature codes associated with the left and right sides of arcs. The expressions 'left' and 'right' are given meaning in this context by specifying the direction of the arc in terms of a 'from node' and a 'to node'. The composition of each polygon can be defined by listing the component arcs, including a negative sign where necessary to ensure consistency in arc direction. In order to distinguish between external and internal boundaries, a convention of clockwise for the former and anticlockwise for the latter (or vice versa), can be adopted. For the purpose of network analysis, each node may be associated with a list of the arcs, which it bounds. The list of arcs connected to each node will generally be in a predetermined order, namely clockwise or anticlockwise. Topological structure is important in keeping track of the components of complex objects and in determining the spatial relationships of connectivity and adjacency between recorded phenomena. Thus if two lines cross each other they will share a common node. If two areas are adjacent to each other, such as two neighboring counties, they will share a common boundary arc. If the boundary of a county coincides with the path of a river they might also share the same arc. The inclusion of one area in another, such as a specific type of forest within a county, will result in their sharing common polygons. The presence of these various spatial relationships can be determined by relatively simple comparisons of the identifiers of their topological components, rather than requiring possibly computationally demanding geometric calculations based on coordinates. It may also be noted that because shared spatial objects are only stored once, though perhaps referenced many times, storage space is saved by avoiding duplication of the same geometric data. This in turn assists in the maintenance of the integrity of the database by avoiding the possibility of two different versions of the same geometric components. 7. The choice between Raster and Vector

15

The raster and vector methods for spatial data structures are distinctly different approaches to modeling geographical information, but are they mutually exclusive? Only a few years ago, the conventional wisdom was that raster and vector data structures were irreconcilable alternatives. They were then irreconcilable because raster methods were required huge computer memories to store and process image at the level of spatial resolution obtained by vector structures. Certain kinds of data manipulation, such as polygon intersection or spatial averaging presented enormous technical problems with the choice of raster methods that allowed easy spatial analysis but resulted in ugly maps, or vector methods that could provide database of manageable size and elegant graphics but in which spatial analysis was extremely difficult.

Vector methods Advantages :- Good representation of phenomenological data structure Compact data structure Topology can be completely described with network linkages Accurate graphics Retrieval, updating and generalization of graphics and attributes are possible Disadvantages: Complex data structures Combination of several vector polygon maps or polygon and raster maps through

overlay creates difficulties Simulation is difficult because each unit has a different topological from Display, and plotting can be expensive, particularly for high quality, colour and

crosshatching The technology is expensive, particularly for the more sophisticated software

and hardware Spatial analysis and filtering within polygons are impossible Raster methods Advantages :- Simple data structures The overlay and combination of mapped data with remotely sensed data is easy Various kinds of spatial analysis are easy Simulation is easy because each spatial unit has the same size and shape The technology is cheap and is being energetically develop

16

Disadvantages :- Volumes of graphic data The use of large cells to reduce data volumes means that phenomenologically Recognizable structures can be lost and there can be a serious loss of information Crude raster maps are considerably less beautiful than maps drawn with fine

lines Network linkages are difficult to establish Projection transformation are time consuming unless spatial algorithms or

hardware are used.

The problem of raster or vector disappears once it is realized that both are valid methods for representing spatial data, and that both structure are inter-convertible. Conversion from vector to raster is the simplest and there are many well know algorithms (e.g. Pavelidis 1982). Vector to raster conversions are now performed automatically in many display screens by inbuilt microprocessors. The reverse operation raster to vector, is also well understood (Pavlidis lists four algorithms for thinning bands of pixel to lines), but it is a much more complex operation that is complicated by the need to reduce the number of co-ordinates in the resulting lines by a process know as weeding. 8. Suggestions For The Use Of Raster And Vector Methods

1. Use VECTOR data structure for data archiving phenomenologically structured data (e.g. soil areas, land use units, etc.).

2 Use VECTOR methods for network analyses, such as for telephone networks,

or transport network analysis.

3. Use VECTOR data structure and VECTOR display methods for the highest quality line drawing.

4. Use RASTER methods for quick and shear map over lay, map combination

and spatial analysis.

5. Use RASTER methods for simulation and modeling when it is necessary to work with surfaces.

17

6. Use RASTER and VECTOR in combination for plotting high quality lines in combination with efficient area filling in colour. the lines can be held in VECTOR format and the raster filling in compact RASTER structures such as run length codes or quadtrees.

7. Preferably use compact VECTOR data structure for digital terrain models, but don't neglect altitude matrices.

8. Use RASTER -VECTOR and VECTOR - RASTER algorithms to convert data

to the most suitable form for a given analysis or manipulation. 9. Remember that DISPLAY systems can operate either in RASTER or VECTOR modes independent of the DATA STRUCTURES that are used to store and manipulate the data. CONCEPTUAL MODEL OF NON-SPATIAL INFORMATION 1. Non-spatial Information and concept of Database Non spatial information, also known as attribute data, is the descriptive data that defines spatial data. Data are raw material from which every land information system is built. They are gathered and assembled into records and files. A database is a collection of data that can be shared by different users. It is a group of records and files that are organized, so that there is little or no redundancy. A data base consists of data in many files, in order to be able to access data from one or more files easily, it is necessary to have some kind of structure or organization. There main kinds of data base structure are commonly recognized, termed as :Hierarchical, Network and Relational. 2. Data Base management Systems Because the cost of maintaining a database increases with its size, care must be taken when working with large quantities of data to ensure that the same information is not duplicated unnecessarily. Avoiding such data redundancy helps to serve another important database function, which is to achieve consistency, whereby data values referring to the same entity are not contradictory, as could happen if a value was updated in one part of the database but not in another.

Consistency may itself be regarded as part of a larger problem of data integrity. This concerns the correctness of the database contents. In general, a database cannot 'know' whether its data values are correct, but it can perform

18

certain checks, such as for unrealistic values and for data items that have been corrupted by the computer hardware. Checks on the initial validity of data can be performed when data are entered.

The fact that there will not in general be any automatic mechanism to prove that the contents of the database are correct means that it may be important to impose restrictions upon personnel entitled to make changes to the database contents. This includes the addition of new data and the deletion of existing data. This is one aspect of database security. Another aspect of security concerns control over who can retrieve information from the database and from particular parts of the database. A DBMS can be regarded as a tool for representing, in a computer, a real-world oriented model of a set of data. This relatively high-level representation, or abstraction, is referred to as a conceptual model and its specific description in the database is often referred to as a schema or a conceptual schema. The database tool for defining the conceptual schema is a data definition language . The syntax of this language differs somewhat between different database systems.

Most commercial database management systems are implemented by one of three widely recognized data models, also referred to as logical models and internal models. These are the hierarchical, the network and the relational model. More recently, object-oriented and logic-based, or deductive, models have been introduced but are not yet widely used, although some of the associated techniques have been introduced in 'post-relational' database management systems. Commercial database management systems are conventionally categorised as either hierarchical, network, relational or object-oriented. The hierarchical data model is currently the least widely represented (IBM's IMS is an example). It can be regarded to some extent as a specialisation of the network model, for which several commercial systems are currently available. The relational model is very much the most widely implemented for commercial applications and it is characterised by quite simple concepts of data organization and related query languages. Queries to the database are expressed in terms of the data entities and attributes that constitute the conceptual model and its local subsets which are the external models. Execution of a query results in a request to the database manager software to find the named data items or classes of data items. 3. Hierarchical Data Base Structure A hierarchical file is a case of a tree structure. The tree is composed of hierarchy of nodes; the upper-most node is called the root. With the exception of

19

this root, every node is related to a node at a higher level called its parent. No element though it can have more than one lower level element called children. A hierarchical file is one with a tree-structure relationship between the records for example a master detail file with two record types. Such a representation is often very convenient because much data tend to be hierarchical in nature or can easily be cast into this structure.

Figure 1.: Hierarchical Database Hierarchical approach is very efficient if all desired access paths follow

the parent child linkages. However, it requires a relatively inflexible structure to be placed on the problem at the outset, when the record type consisting the tree structure is setup. The combination of inflexible structure is setups and the overheads of maintaining or changing pointer system makes extensive modification of the structure of hierarchical systems to meet new requirements, a resource intensive operation. These reasons have contributed to the lack of adoption of this type of DBMS for flexible GIS requirements.

4. Network Structure A network structure exists when a child in a data relationship has more than one parent. An item in such a structure can be linked to any other item. The physical data to support complex network structures is far more difficult to develop than for simple structures.

Department

Job Description Employee

Education Required

Background Required

Job History Education

Author 1

Author 2

Book 1

Book 2

Book 3

20

Figure 2.:Network Structure

Each entity set with its attributes is considered to be a node in the network. Relationship sets are represented as linkages in the form of pointers between individual entities in different entity sets. As a result, all the different forms of mapping one-to-many, many-to-many, etc. can be handled directly with large number of pointers. The network approach is powerful and flexible. For many applications, it is also very fast and efficient in terms of CPU resources. From the implementation point of view, it may be comparatively difficult to set up the database correctly and although the query language is comprehensive, it may also be complex and confusing for less expert users. Major restructuring of the data base may be time consuming because of the extensive pointer structure that have to be rebuilt.

5. The Relational Model

The main data storage concept in the relational model is a table of records,

referred to as a relation, or simply a table. The records in a table contain a fixed number of fields, which must all be different from each other, and all records are of identical format. There is, therefore, a simple row and column structure. In relational database terminology the rows, or records, are also referred to as tuples, while the columns of fields are sometimes referred to as domains. Each record of a table stores an entity or a relationship and is uniquely identified by means of a primary key which consists of one field, or a combination of two or more fields in the record. The need for composite keys, consisting of more than one field, arises if no one field can be guaranteed unique. The fields of an entity table store attributes of the entity to which the table corresponds. Table 1 illustrates an example for Settlement. Settlement name

Settlement status

Settlement population

County name

Gittings Village 243 Downshire Bogton Town 31520 Downshire Puffings Village 412 Binglia Pondside City 112510 Mereshire Craddock Town 21940 Binglia

21

Bonnet Town 28266 Binglia Drain Village 940 Mereshire

Table1.:Example of Relational Database

In this type, data are organized in two-dimensional tables, such tables are easy for a user to develop and understand. This structure can be described mathematically, a most difficult task for other types of data structure. These structures are called relational structures because each table represents a relation. Since different users see different sets of data and different relationships between them, it is necessary to extract sub-sets of the table columns for some users and to join tables together for others to form larger tables. The mathematics provides the basis for extracting some columns from the tables and for joining various columns. This capability to manipulate relations provides flexibility is normally not available in hierarchical or network structure. For purposes of handling spatial data there is a problem concerning the definition of records in a relational database. The records are intended to store a set of data fields of different type. Several important entities in spatial data consist of sets of data items of the same type, such as the coordinates making up a line or the arcs making up a polygon. in a standard relational database such data items of the same type must be stored in separate records, the consequence of which can be overheads in storage space and poor performance in accessing all the data items, such as coordinates, that constitute a logical entity such as a line. Relational databases are sometimes used in GIS in combination with special purpose file . Relational systems are characterized by simplicity, in that all the data are represented in tables (relations) of rows and columns. From the data base design viewpoint, entity relationship modeling fits very closely with relational systems. Each entity set is represented by a table, while each row or ‘tube’ in the table represents the data for an individual entity. Each column holds data on one of the attributes of the entity set. Since relationships between entities are directly represented as tables, there is no requirement for pointers or linkages between data records to be set up, as was the case with hierarchical or network systems.

22

5.1 Relational operators

Retrieval from a relational database involves creating, perhaps temporarily, new relations which are subsets or combinations of the permanently stored relations. There are several relational algebra operators that can be used to search and manipulate relations in order to perform such retrievals. Some of these operators are selection, project, union and join. Other standard operators include product, divide and intersection. From the user's point of view, the operators are not named as such but are implemented by means of the standard Structured Query Language (SQL) using a number of commands and key words. For example, the command SELECT settlement-name, county-name FROM Settlement will create a new table which consists only of the settlement name and county fields of the Settlement table. The selection (or restrict) operation is concerned with retrieving a subset of the records of a table on the basis of retrieval criteria expressed in terms of the contents of one or more of the fields in each record. For example, to retrieve all settlements in the county of Mereshire with a population greater than 20000, the SQL command would be SELECT FROM Settlement

WHERE county-name = Mereshire AND settlement-population > 20000 Note that the WHERE condition consists of a logical expression. This query could have been combined with a projection operation by specifying field names after the SELECT command. The join operator is more complicated than projection and selection in that its purpose is to combine fields from two or more tables. The operator depends on the tables being related to each other by means of a common field. 5.2 Important features of Relational Data Bases

Primary and Foreign keys Relational joins Normal forms

23

5.2.1 The Primary and Foreign Keys The Relational approach has important implications for the design of data base tables. Since each table or relation represents a set, it cannot, therefore, have any rows whose entire contents are duplicated. Secondly, as each row must be different to every other, it follows that a value in a single column, or a combination of values in multiple columns, can be used to define a primary key for the table, which allows each row to be uniquely identified. The uniqueness properly allows the primary key to serve as the sole row level addressing mechanism in the relational data base model.

A field that stores the key of another table is called a foreign key. It is important to realize that the primary key of a table and any foreign keys that it may store consist of logical data items which may be attributes such as names or some allocated numerical identifier. They do not consist of physical addresses in the database. They will, however, be used as the basis of indexing mechanisms which the database management system uses to provide efficient query processing. 5.2.2. Relational joins The mechanism for linking data in different tables is called a relational join. Values in a column or columns in one table are matched to corresponding values in a column or columns in a second table. Matching is frequently based on a primary key in one table linked to a column in the second, which is termed a foreign key. An example of the join mechanism is shown below :

Name Designation Emp. Code MARTIN KEN EASAN

PROFESSOR READER

1107 1205

Name Salary Experience (years)

……….

MARTIN 12

Join

24

Figure 3.:Example of the join mechanism 5.2.3 Normal Forms A certain amount of necessary data redundancy is implicitly in the relational model because the join mechanism matches column values between tables. Without careful design, unnecessary redundancy may be introduced into the database. One of the tasks of a database designer would be to reduce all information to normalised form. Thus a relational database table can be regarded as representing a set of entities, each of which is stored in a record of the table. Alternatively, a table can represent a relationship which links key fields of associated entities. There are several degrees of normalisation. They differ in various respects, including the extent to which data items within a record are dependent upon each other, as opposed to having an independent

All the tables must contain rows and columns and column values must be atomic, that is they do not contain repeating groups of data, such as multiple values of a census variable for different years. The second requirement of normal form is that every column, which is not part of the primary key, must be fully dependent on the primary key. The third normal form requires that every non-primary key column must be non-transitively dependent on the primary key. Nevertheless, the fundamental working rule for most circumstances ensure that each attribute of a table represents a fact about the primary key, the whole primary key and nothing, but the primary key, while this is entirely valid from the design view point, it must also be said that practical implementation requirements may, on occasion, override theoretical considerations and lead to tables being merged and denormalized, usually for performance reasons. 5.2.4. Advantages and disadvantages of relational systems

The Advantages can be summarized as follows:

25

Rigorous design methodology based on sound theoretical foundations All the other data base structures can be reduced to a set of relational tables,

so they are the most general form of data representation Ease of use and implementation compared to other types of system Modifiability, which allows new tables and new rows of data within tables to be

added without difficulty Flexibility in ad-hoc data retrieval because of the relational joins mechanism

and powerful query language facility. The Disadvantages are as follows:

A greater requirement for processing resources with increasing numbers of

users on a given system than with the other types of data base. On heavily loaded systems, queries involving multiple relational joins may give

slower response times than are desirable. This problem can largely be mitigated by effective use of indexing and other optimization strategies, together with the continued improvements in price performance in computing hardware from mainframes to PC’s.

The DBMS provides a wide range of ready-made data manipulation tools, so programming effort can be concentrated on algorithms for spatial analysis and user interface requirements. Though, a data base approach has several advantages over file system approach, GIS system designers prefer the latter approach for storage of digital map coordinates. This had led to the development of two different approaches to implementation, based on either a hybrid or an integrated data model.

6. Object Oriented Databases A recent trend in both software engineering and in database design is towards the use of object-oriented techniques. For the purposes of geographical databases these techniques are of great interest since they hold the promise to overcome significant shortcomings, from the point of view of GIS, of the widely used relational database methods. Normal queries to a GIS require spatial data processing operations which such standard query languages cannot currently handle. Object-oriented techniques provide the tools for building databases which, unlike relational databases, model complex spatial objects. The database representations of objects include, in addition to stored data, specialized procedures for spatial searching and for

26

executing queries which may require geometric and topological data processing. Objects in an object oriented database are intended to correspond to classes of real-world object and are implemented by combining data, which describe the object attributes, with the procedures, or methods, which operate on them. Accessing an object involves sending a message to it, which results in the addressed object using its internal methods to respond to the message. A variety of types of message may be sent to an individual object, depending upon its properties and the methods that it has implemented. Examples of the types of message that might be sent to a polygon class of object would be to return its coordinates, to return the result of a measurement, such as area or perimeter calculation, or to display the polygon on a graphics device. An individual object is an instantiation, or a particular example, of a class of objects, and as such it is uniquely identified within the database with an object identifier. An object class may inherit the properties, data attributes and methods of one or more other object classes. Thus having defined typical object classes, new ones may be created which are combinations of or subclasses of existing ones 7. Considerations In Adopting A Data Base Approach The adoption of a data base approach to data management is a major decision that impacts on every facet of CBIS, the decision dictates a virtually irreversible course of system design, and for existing file oriented systems the decision may involve some temporary disruption of services and no small risk to file integrity during the conversion process. 7.1 Advantages of the Data Base Approach :

1. Redundancy in data storage is reduces 2. Data maintenance is simplified 3. Processing time is reduces 4. Internal consistency among data is improved 5. Data can be shared by many applications

7.2 Disadvantages of Data Base Approach:

1. Costs are high 2. Security is difficult to maintain 3. Consequences of Security breaches may be severe 4. Greater control over data is required.

27

SPATIAL DATA ANALYSIS AND MODELLING INTRODUCTION TO SPATIAL DATA ANALYSIS Introduction

Geographic analysis allows us to study and understand the real world processes by developing and applying manipulation, analysis criteria & models and to carryout integrated modeling. These criteria illuminate underlying trends in geographic data, making new information available. A GIS enhances this process by providing tools which can be combined in meaningful sequence to reveal new or previously unidentified relationships within or between data sets, thus increasing better understanding of real world. The results of geographic analysis can be commercial in the form of maps, reports or both. Integration involves bringing together diverse information from a variety of sources and analysis of multi-parameter data to provide answers and solutions to defined problems.

Spatial analysis is the vital part of GIS. It can be done in two ways. One is the vector based and the other is raster based analysis. In next two chapters these two analyses will be discussed in detail along with the tools and operations provided. Present chapter gives an overview of general concepts and various types of analysis possible in GIS. Vector and raster based spatial analysis will be talked about in detail in separate chapters. As the concepts are more or less the same in both vector and raster based analysis, we shall discuss some of the concepts in vector mode, while some others in raster mode.

SIGNIFICANCE OF SPATIAL ANALYSIS Spatial analysis is one of the most important uses of GIS. Since the advent of GIS in the 1980s, many government agencies have invested heavily in GIS installations, including the purchase of hardware and software and the construction of mammoth data-bases. Two fundamental functions of GIS have been widely realized : generation of maps and generation of tabular reports. Indeed, GIS provides a very effective tool for generating maps and statistical reports from a database. However, GIS functionality far exceeds the purposes of mapping and report compilation. In addition to the basic functions related to automated cartography and data base management systems, the most important uses of GIS are spatial analysis capabilities. As spatial information is organized in a GIS, it should be able to answer complex questions regarding space.

28

Making maps alone does not justify the high cost of building a GIS. The same maps may be produced using a simpler cartographic package. Likewise, if the purpose is to generate tabular output, then a simpler database management system or a statistical package may be a more efficient solution. It is spatial analysis that requires the logical connections between attribute data and map features, and the operational procedures built on the spatial relationships among map features. These capabilities make GIS a much more powerful and cost-effective tool than either automated cartographic packages, statistical packages, or data base management systems. Indeed, functions required for performing spatial analyses that are not available in either cartographic packages or data base management systems are commonly implemented in GIS. USING GIS FOR SPATIAL ANALYSIS Spatial analysis in GIS involves three types of operations: Attribute Query- also known as aspatial query, Spatial Query and Generation of new data sets from the original database. The scope of spatial analysis ranges from a simple query about the spatial phenomenon to complicated combinations of attribute queries, spatial queries, and alterations of original data. Attribute Query: Requires the processing of attribute data exclusive of spatial information. In other words, it’s a process of selecting information by asking logical questions. Example: From a database of a city parcel map where every parcel is listed with a land use code, a simple attribute query may require the identification of all parcels for a specific land use type. Such a query can be handled through the table without referencing the parcel map (Fig. 1). Because no spatial information is required to answer this question, the query is considered an attribute query. In this example, the entries in the attribute table that have land use codes identical to the specified type are identified.

Parcel No.

Size Value Land Use

102 7,500 200,000 Commercial 103 7,500 160,000 Residential 104 9,000 250,000 Commercial 105 6,600 125,000 Residential

A sample parcel map Attribute table of the sample parcel map

29

Listing of Parcel No. And value with landuse = ‘commercial’ is an attribute query Identification of all parcels within 100-m distance is a spatial query

Fig. 1 Spatial Query: Involves selecting features based on location or spatial relationships, which requires processing of spatial information. For instance a question may be raised about parcels within one mile of the freeway and each parcel. In this case, the answer can be obtained either from a hardcopy map or by using a GIS with the required geographic information (Fig. 2). Example: Let us take one spatial query example where a request is submitted for rezoning, all owners whose land is within a certain distance of all parcels that may be rezoned must be notified for public hearing. A spatial query is required to identify all parcels within the specified distance. This process can not be accomplished without spatial information. In other words, the attribute table of the database alone does not provide sufficient information for solving problems that involve location. Parcels for rezoning Parcels for notification Fig. 2 Land owners within a specified distance from the parcel to be rezoned identified through spatial query While basic spatial analysis involves some attribute queries and spatial queries, complicated analysis typically require a series of GIS operations including multiple attribute and spatial queries, alteration of original data, and generation of new data sets. The methods for structuring and organizing such operations are major concern in spatial analysis. An effective spatial analysis is one in which the best available methods are appropriately employed for different types of attribute queries, spatial queries, and data alteration. The design of the analysis depends on the purpose of study. GIS USAGE IN SPATIAL ANALYSIS

30

GIS can interrogate geographic features and retrieve associated attribute information, called identification. It can generate new set of maps by query and analysis. It also evolves new information by spatial operations. Here are described some analytical procedures applied with a GIS. GIS operational procedure and analytical tasks that are particularly useful for spatial analysis include: Single layer operations Multi layer operations/ Topological overlay Spatial Modelling Geometric modeling Calculating the distance between geographic features Calculating area, length and perimeter Geometric buffers.

Point pattern Analysis Network analysis Surface analysis Raster/Grid analysis Fuzzy Spatial Analysis Geostatistical Tools for Spatial Analysis

1. Single layer operations: are procedures, which correspond to queries and

alterations of data that operate on a single data layer.

Example: Creating a buffer zone around all streets of a road map is a single layer operation. As shown in the figure3:

Streets

Buffer zones

Fig. 3 : Buffer zones extended from streets

31

2. Multi layer operations: are useful for manipulation of spatial data on multiple data layers. Figure 4 depicts the overlay of two input data layers representing soil map and a landuse map respectively. The overlay of these two layers produce the new map of different combinations of soil and land use are delineated.

Fig. 4: The overlay of two data layers creates a map of combined polygons.

3. Topological overlays: These are multi layer operations which allow to combine features from different layers to form a new map and give new information and features that were not present in the individual maps. This topic will be discussed in detail in section of vector based analysis. 4. Spatial modeling: involves the construction of explanatory and predictive models for statistical testing.

Figure 5 shows an example of air pollution spatial modeling. Emissions of a specific particulate are measured at monitoring stations represented as point locations on the bottom layer. The distribution of air pollution is believed to be related to soils (silt content and other soil characteristics), agricultural operations, roads, and topography. With a data base containing all required data elements, a spatial model can be constructed to explain the distribution of air pollution based on these related variables.

101 103 102

32

Fig. 5: A spatial model of air pollution based on the distributions of several variables.

5. Point pattern analysis: deals with the examination and evaluation of spatial patterns and the processes of point features. A typical biological survey map is shown in figure 6, in which each point feature denotes the observation of an endangered species such as big horn sheep in southern California. The objective of illustrating point features is to determine the most favourable environmental conditions for this species. Consequently, the spatial distribution of species can be examined in a point pattern analysis. If the distribution illustrates a random pattern, it may be difficult to identify significant factors that influence species distribution. However, if observed locations show a systematic pattern such as the clusters in this diagram, it is possible to analyze the

33

Fig. 6: Distribution of an endangered species examined in a point pattern

analysis.

animals’ behaviour in terms of environmental characteristics. In general, point pattern analysis is the first step in studying the spatial distribution of point features.

6. Network analysis: designed specifically for line features organized in connected networks, typically applies to transportation problems and location analysis such as school bus routing, passenger plotting, walking distance, bus stop optimization, optimum path finding etc.

Figure 7 shows a common application of GIS-based network analysis. Routing is a major concern for the transportation industry. For instance, trucking companies must determine the most cost-effective way of connecting stops for pick-up or delivery. In this example, a route is to be delineated for a truck to pick up packages at five locations. A routing application can be developed to identify the most efficient route for any set of pick-up locations. The highlighted line represents the most cost-effective way of linking the five locations.

Fig. 7: The most cost effective route links five point locations on the street map.

7. Surface analysis: deals with the spatial distribution of surface information in terms of a three-dimensional structure. The distribution of any spatial phenomenon can be displayed in a three-dimensional perspective diagram for visual examination. A surface may represent the distribution of a variety of phenomena, such as population, crime, market potential, and topography, among many others. The perspective diagram in figure 8 represents topography of the terrain,

34

generated from digital elevation model (DEM) through a series of GIS-based operations in surface analysis.

Fig. 8: Perspective diagram representing topography of the terrain derived

from a surface analysis. 8. Grid analysis: involves the processing of spatial data in a special, regularly

spaced form.

The following illustration (figure 9) shows a grid-based model of fire progression. The darkest cells in the grid represent the area where a fire is currently underway. A fire probability model which incorporates fire behaviour in response to environmental conditions such as wind and topography delineates areas that are most likely to burn in the next two stages. These areas are represented by lighter shaded cells. Fire probability models are especially useful to fire fighting agencies for developing quick-response, effective suppression strategies.

35

Fig. 9: A fire behaviour model delineates areas of fire progression based on a grid analysis.

In most cases, GIS software provide the most effective tool for performing the above tasks.

9. Fuzzy Spatial Analysis Fuzzy spatial analysis is based on Fuzzy set theory formulated by Zadeh (1965). Fuzzy set theory is a generalization of Boolean algebra to situations where zones of gradual transition are used to divide classes, instead of conventional crisp boundaries. This is more relevant in many cases where one considers 'distance to certain zone' or 'distance to road', in which case the influence of this factor is more likely to be some function of distance than a binary 'yes' or 'no'. Also in fussy theory maps are prepared showing gradual change in the variable from very high to very low, which is a true representation of the real world. As stated above, the conventional crisp sets allow only binary membership function(i.e. true or false) where as a fuzzy set is a class that admits the possibility of partial membership, so fuzzy sets are generalization of crisp sets to situations where the class membership or class boundaries are not, or cannot be, sharply defined. Applications: Data integration using fuzzy operators: Various thematic data layers, represented by respective membership values, can be combined by using standard rules of fuzzy algebra.

36

Example: In a grid cell/pixel if a particular litho-unit occurs in combination with a thrust/fault, its membership value should be much higher compared with individual membership values of litho-unit or thrust/fault. This is significant as the effect is expected to be "increasive" in our present consideration and it can be calculated by fuzzy algebraic sum. Similarly, if the presence of two or a set of parameters results in "decreasive" effect, it can be calculated by fuzzy algebraic product. Besides this, fuzzy algebra offers various other methods to combine different data sets for landslide hazard zonation map preparation. To combine number of exploration data sets, five such operators exist, namely the fuzzy AND, the fuzzy OR, fuzzy algebraic product, fuzzy algebraic sum and fuzzy gamma operator.

Fuzzy logic can also be used to handle mapping errors or uncertainty i.e. errors associated with clear demarcation of boundaries and also errors present in the area where limited ground truth exists in studies such as landslide hazard zonation. The above two kinds of errors are almost inherent to the process of data collection from different sources including remote sensing. 10. Geostatistical Tools for Spatial Analysis

Geostatistics studies spatial variability of regionalized variables: Variables that have an attribute value and a location in a two or three dimensional space. Tools to characterize the spatial variability are:

Spatial Autocorrelation Function and Variogram.

A variogram is calculated from the variance of pairs of points at

different separation. For several distance classes or lags, all point pairs are identified which match that separation and the variance is calculated. Repeating this process for various distance classes yields a variogram: A function in which distance class is plotted versus variance.

Similarly, the spatial autocorrelation can be calculated and plotted in a autocorrelogram. These functions can be used to measure spatial variability of point data but also of maps or images. 1. Spatial autocorrelation of point data

The statistical analysis referred to as spatial autocorrelation, examines the correlation of a random process with itself in space. Many variables that have discrete values measured at several specific geographic positions (i.e., individual observations can be approximated by dimensionless points), can be considered random processes and can thus be analyzed using spatial autocorrelation analysis.

37

Examples of such phenomena are: Total amount of rainfall, toxic element concentration, grain size, elevation at triangulated points, etc. The spatial autocorrelation function, shown in a graph is referred to as spatial autocorrelogram, showing the correlation between a series of points or a map and itself for different shifts in space or time. It visualizes the spatial variability of the phenomena under study. In general, large numbers of pairs of points that are close to each other on average have a lower variance (i.e., are better correlated), than pairs of points at larger separation. The autocorrelogram quantifies this relationship and allows to gain insight into the spatial behavior of the phenomenon under study. 2. Point interpolation

A point interpolation performs an interpolation on randomly distributed point values and returns regularly distributed point values. The various interpolation methods are: Voronoi Tesselation, moving average, trend surface and moving surface. Example: Nearest Neighbor (Voronoi Tesselation) In this method the value, identifier, or class name of the nearest point is assigned to the pixels. It offers a quick way to obtain a Thiessen map from point data (Figure 1).

Figure 26: (a) An input point map, (b) The output map obtained as the result of the interpolation operation applying the Voronoi Tesselation method. MODELING/ANALYSIS ISSUES INVOLVED IN GIS: Must understand data in totality and their relationships Data accuracy and quality Selecting right parameters for integration. Criteria formulation depending upon aim and objective of analysis.

38

(A) VECTOR BASED SPATIAL DATA ANALYSIS

In the present chapter, basic concepts of various overlay operations is laid down. The seuqence of performing a simple spatial analysis is also given. And finally important tips are given to be taken cared of whole doing any kind of spatial analysis. VARIOUS TYPES OF OVERLAY OPERATIONS IN GIS

These are multi layer operations which allow to combine features from different layers to form a new map and give new information and features that were not present in the individual maps Topological overlays:. Selective overlay of polygons, lines and points enables the

users to generate a map containing features and attributes of interest, extracted from different themes or layers. Overlay operations can be performed on both raster (or grid) and vector maps. In case of raster Map calculation tool is used to perform overlay. We shall be discussing the various overlay operations offered by ARC/INFO.

In topological overlays polygon features of one layer can be combined with point, line & polygon features of a layer.

Polygon-in-polygon overlay:

Output is polygon coverage.

39

Coverages are overlaid two at a time. There is no limit on the number of coverages to be combined. New FAT is created having information about each newly created features.

Line-in-polygon overlay:

Output is line coverage with additional attribute. No polygon boundaries are copied. New arc-node topology is created.

Point – in polygon overlay:

Output is point coverage with additional attributes. No new point features are created. No polygon boundaries are copied.

Logical Operators: Overlay analysis manipulates spatial data organized in

different layers to create combined spatial features according to logical conditions specified in Boolean algebra with the help of logical and conditional operators. The logical conditions are specified with operands (data elements)& operators (relationships among data elements).

Note: In vector overlay, arithmetic operations are performed with the help of logical operators. There is no direct way to it. Common logical operators include AND, OR, XOR (Exclusive OR), and NOT. Each

operation is characterized by specific logical checks of decision criteria to determine if a condition is true or false. The following table shows the true/ false conditions of the most common Boolean operations. In this table, A & B are two operands. One (1) implies a true condition and zero (0) implies false. Thus, if the A condition is true while the B condition is false, then the combined condition of A & B is false, whereas the combined condition of A OR B is true.

AND: Common Area/ Intersection / Clipping Operation OR : Union Or Addition NOT : (Inverter) XOR : Minus

Truth Table of common Boolean operations A B A AND B A OR B A NOT B B NOT A A XOR B 0 0 0 0 0 0 0 0 1 0 1 0 1 1

40

1 0 0 1 1 0 1 1 1 1 1 0 0 0 The most common basic multi layer operations are union, intersection, & identify operations. All three operations merge spatial features on separate data layers to create new features from the original coverage. The main difference among these operations is in the way spatial features are selected for processing. Conditional Operators: EQ = Equal to NE #, < > Not equal to GE > = Greater than or equal to LE < = Less than or equal to GT > Greater than LT < Less than CN Containing NC Not containing

OPERATION PRIMARY LAYER OPERATION LAYER RESULT

CLIP

ERASE

SPLI

41

Fig. 1: Overlay operations 1. VARIOUS TYPES OF SPATIAL OPERATIONS IN GIS Spatial Join Operations:

IDENTITY, INTERSECT and UNION provide different type of overlay operations and give flexibility for geographic data manipulation and analysis. In polygon overlay, features form two map coverages are geometrically intersected to produce a new set of information. Attributes for these new features are derived from the attributes of both the original coverages, thereby contain new spatial and attribute data relationships.

IDENTI

INTERSE

UNION`

IDEN{fuzz compoverlFeatu

I

IDEN

INTE{fuzz

NTITY [inzy_toleranc

putes the glaying the ure attribut

DENTITY NITY only in

ERSECT [inzy_toleranc

n_cover] [idce}

geometric feature extes from bo

is similar n the featur

n_cover] [ince}

dentity_cov

intersectionxtent of toth coverage

to UNIOres which r

ntersect_co

er] [out_co

n of two che first spes are joine

ON and INemain in the

over] [out_c

ver] {POLY

coverages. pecified coed in the out

NTERSECYTe output cov

cover] {POLY

/ LINE / P

Only thooverage aretput covera

T. These dverage.

Y / LINE /

4

OINT}

ose featuree preservedage.

differ from

POINT}

42

es d.

m

43

computes the geometric intersection of two coverages. Only those features in the area common to both are preserved. Feature attributes from both coverages are joined in the output coverage.

UNION [in_cover] [union_cover] [out_cover] {fuzzy_tolerance} computes the geometric intersection of two polygon coverages. All features and attributes of both coverages are preserved.

45

Feature Extraction Operations: CLIP, ERASE and RESELECT facilitate extraction of desired features from a

coverage either by using a template coverage or by using spatial or logical criteria.

CLIP [in_cover] [clip_cover] [out_cover] {POLY / LINE / POINT / NET / LINK} {fuzzy_tolerance} extracts features from a coverage that overlaps another coverage using the clip coverage as a 'cookie cutter'.

ERASECOV [in_cover] [erase_cover] [out_cover] {POLY / LINE / POINT / NET / LINK} {fuzzy_tolerance} erases features from a coverage that overlaps another coverage.

46

RESELECT [in_cover] [out_cover] {POLY / LINE / POINT} {sml_file} extracts map features from a coverage based on their attribute values. Feature merging Operations:

DISSOLVE and ELIMINATE enables the polygon merging to create new polygon feature and removal of the spurious/sliver polygons resulted due to an overlay operation respectively.

Proximal operation Operations:

ARC/INFO provides BUFFER command, which can be used to define a zone of specified distance around a selected feature. Different sized buffers can be generated around a selected feature based on associated attribute data. Map database merging and splitting Operations:

MAPJOIN and SPLIT commands facilitate the merging or splitting of maps. Coordinate transformation Operations: PROJECT and TRANSFORM commands enable us to do coordinate transformation using affine or projective transformation based on a set of control points. PROJECT supports coordinate transformation between any two projections. * Note that ESRI-ARC/INFO offers all above functions and commands for geographic information manipulation and analysis. Recent versions of ARC/INFO [ver. 7.1.1& 7.1.2(Unix based)] also provide special

functions designed for manipulation and analysis of REGIONS. It offers host of commands viz. REGIONBUFFER, REGIONDISSOLVE, AREAQUERY, REGIONQUERY, REGIONSELECT etc.

BUFFER ANALYSIS

Spatial searching (also called buffering or proximity analysis) is based on the distance derived from certain selected features. Area expansion of features is commonly known as buffer operation in GIS. It is used to highlight a zone of interest around a point, line and polygon which in turn can be used to retrieve attribute data or generate new features. Both constant and variable width buffers can be generated. Because the buffer operation expands area, it always results in polygon features.

47

Salient Features of Buffer Operation: Buffer can be used to generate buffer zones around a point, line or polygon feature. It allows to find the areas around the feature within the specified buffer zone. BUFFER creates a new output coverage by generating buffer zones around input

coverage features. Input coverage features can be point, line, polygon or nodes. Output coverage features will always be polygons. Polygon topology is created for the output coverage. New label points are

created in each polygon. Each polygon is flagged according to the type of area it represents & is stored

in an item called INSIDE in the output coverage PAT. 100-Polygons representing a buffer zone.

1 - Polygon outside the buffer zone. Buffer zones can be controlled in two ways-

By using buffer distance to specify a single size for all buffer zones By specifying a buffer item, optionally, a buffer table to generate multiple

buffer sizes. Syntax- BUFFER <in-cover> <out-cover> {buffer-item} {buffer-table}{buffer-distance}{fuzzy-tolerance} {LINE | POINT | NODE} Point coverage Buffer zones Output Polygon coverage Buffer zones (middle) are generated from a point coverage (left) resulting in a

polygon coverage (right).

48

Line coverage Output Polygon coverage Buffer zones generated from a line coverage (left) define a polygon coverage

(right). Input polygon coverage Buffer zones Output Polygon coverage Buffer operation creates an expanded polygon coverage (right). from two

separate polygons (left).

Fig. 10: Proximity operations STEPS FOR PERFORMING GEOGRAPHIC ANALYSIS Step 1. Establish the objectives & criteria for the analysis Step 2. Prepare data for spatial operations Step 3. Perform spatial operation Step 4. Prepare data for tabular analysis Step 5. Perform tabular operations Step 6. Evaluate & interpret the results Step 7. Refine the analysis as necessary MODELING ISSUES INVOLVED IN PERFORMING GEOGRAPHIC ANALYSIS: Must understand data in totality and their relationships Data accuracy and quality Selecting right parameters for integration. Criteria formulation depending upon aim and objective of analysis.

49

(B) SPATIAL DATA ANALYSIS: RASTER BASED 1. INTRODUCTION

Present section discusses operational procedures and quantitative methods for the analysis of spatial data in raster format. In raster analysis, geographic units are regularly spaced, and the location of each unit is referenced by row and column positions. Because geographic units are of equal size and identical shape, area adjustment of geographic units is unnecessary and spatial properties of geographic entities are relatively easy to trace. All cells in a grid have a positive position reference, following the left-to-right and top-to-bottom data scan as shown in figure 1 . Every cell in a grid is an individual unit and must be assigned a value. Depending on the nature of the grid, the value assigned ti al cell can be an integer or a floating point. When data values are not available for particular cells, they are described as NODATA cells. NODATA cells differ from cells containing zero in the sense that zero value is considered to be data.

Fig. 1: Common coordinate systems and grid follow different referencing

structures

The regularity in the arrangement of geographic units allows for the underlying spatial relationships to be efficiently formulated. For instance, the distance between orthogonal neighbors (neighbors on the same row or column) is always a constant whereas the distance between two diagonal units can also be computed as a function of that constant. Therefore, the distance between any pair of units can be computed from differences in row and column positions. Furthermore, directional information is readily available for any pair of origin and destination cells as long as their positions in the grid are known.

5 4 3 2 1

1 2 3 4 5 6 x

y

50

1.1 Advantages of using the raster format in spatial analysis are listed below :

Efficient processing : Because geographic units are regularly spaced with identical spatial properties, multiple layer operations can be processed very efficiently.

Numerous existing sources : Grids are the common format for

numerous sources of spatial information including satellite imagery, scanned aerial photos, and digital elevation models, among others. These data sources have been adopted in many GIS projects and have become the most common sources of major geographic databases.

Different feature types organized in the same layer : For instance,

the same grid may consist of point features, line features, and area features, as long as different features are assigned different values.

1.2 Grid format disadvantages appear below :

Data redundancy : When data elements are organized in a regularly spaced system, there is a data point at the location of every grid cell, regardless of whether the data element is needed or not. Although, several compression techniques are available, the advantages of gridded data are lost whenever the gridded data format is altered through compression. In most cases, the compressed data cannot be directly processed for analysis. Instead, the compressed raster data must first be decompressed in order to take advantage of spatial regularity.

Resolution confusion : Gridded data give an unnatural look and

unrealistic presentation unless the resolution is sufficiently high. Conversely, spatial resolution dictates spatial properties. For instance, some spatial statistics derived from a distribution may be different, if spatial resolution varies, which is the result of the well-known scale problem.

Cell value assignment difficulties: Different methods of cell value

assignment may result in quite different spatial patterns.

51

2. GRID OPERATIONS USED IN MAP ALGERBRA Common operations in grid analysis consist of the following functions, which are used in Map Algebra to manipulate grid files. The Map Algebra language is a programming language developed to perform cartographic modeling. Map Algebra performs following four basic operations : local functions: that work on every single cell, focal functions: that process the data of each cell based on the information of

a specified neighborhood, zonal functions: that provide operations that work on each group of cells of

identical values, and global functions: that work on a cell based on the data of the entire grid.

The principal functionality of these operations is described here. 2.1 Local Functions: Local functions process a grid on a cell-by-cell basis, that is, each cell is processed based solely on its own values, without reference to the values of other cells. In other words, the output value is a function of the value or values of the cell being processed, regardless of the values of surrounding cells.

For single layer operations, a typical example is changing the value of each cell by adding or multiplying a constant. In the following example, the input grid contains values ranging from 0 to 4. Blank cells represent NODATA cells. A simple local function multiplies every cell by a constant of 3 (Fig. 2). The results are shown in the output grid at the right. When there is no data for a cell, the corresponding cell of the output grid remains a blank.

Input Grid Output Grid 2 0 1 1

X 3 = 6 0 3 3

2 3 0 4 6 9 0 12

4 2 3 12

6 9

1 1 2 3 3 6 Fig.2: A local function multiplies each cell in the input grid by 3 to produce the

output grid

Local functions can also be applied to multiple layers represented by multiple grids of the same geographic area (Fig. 3).

52

Input Grid Multiplier Grid Output Grid

2 0 1 1 X

1 1 2 2 =

2 0 2 2 2 3 0 4 1 2 2 2 2 6 0 1

2 4 2 3 2 2 3 3 8 6 9 1 1 2 2 3 3 4 2 3 8

Fig.3: A local function multiplies the input grid by the multiplier grid to produce the output grid

Local functions are not limited to arithmetic computations. Trigonometric, exponential, and logarithmic and logical expressions are all acceptable for defining local functions. Focal Functions: Focal functions process cell data depending on the values of neighboring cells. For instance, computing the sum of a specified neighborhood and assigning the sum to the corresponding cell of the output grid is the “focal sum” function (figure 4). Neighborhood is defined by a 3 x 3 kernel. For cells closer to the edge where the regular kernel is not available, a reduced kernel is used and the sum is computed accordingly. For instance, the upper left corner cell is adjusted by a 2 X 2 kernel. Thus, the sum of the four values, 2,0,2 and 3 yields 7, which becomes the value of this cell in the output grid. The value of the second row, second column, is the sum of nine elements, 2, 0, 1, 2, 3, 0, 4, 2 and 2, and the sum equals 16.

Input Grid Output Grid 2 0 1 1

Focal Sum = 7 8 9 6

2 3 0 4 13

16

16

11

4 2 2 3 13

18

20

14

1 1 3 2 8 13

13

10

Fig.4: A Focal sum function sums the values of the specified neighborhood to produce the output grid

Another focal function is the mean of the specified neighborhood, the “focal mean” function. In the following example (Fig. 5), this function yields the mean of the eight adjacent cells and the center cell itself. This is the smoothing function to obtain the moving average in such a way that the value of each cell is changed into the average of the specified neighborhood.

53

Input Grid Output Grid

2 0 1 1 Focal Mean =

1.8

1.3

1.5

1.5

2 3 0 4 2.2

2.0

1.8

1.8

4 2 2 3 2.2

2.0

2.2

2.3

1 1 3 2 2.0

2.2

2.2

2.5

Fig.5: A Focal mean function computes the moving average of the specified neighborhood to produce the output grid Other commonly employed focal functions include standard deviation (focal standard deviation), maximum (focal maximum), minimum (focal minimum), and range (focal range). Zonal Functions : Zonal functions process the data of a grid in such a way that cell of the same zone are analyzed as a group. A zone consists of a number of cells that may or may not be contiguous. A typical zonal function requires two grids – a zone grid which defines the size, shape and location of each zone, and a value grid which is to be processed for analysis. In the zone grid, cells of the same zone are coded with the same value, while zones are assigned different zone values. Figure 6 illustrates an example of the zonal function. The objective of this function is to identify the zonal maximum is to be identified for each zone. In the input zone grid, there are only three zones with values ranging from 1 to 3. The zone with a value of 1 has five cells, three at the upper right corner and two at the lower left corner. The procedure involves finding the maximum value among these cells from the value grid. Zone Grid Value Grid Output Grid

Zonal Max [

2 2 1 1

1 2 3 4 ]=

5 5 8 8

2 3 3 1 5 6 7 8 5 7 7 8 3 2 1 2 3 4 7 5 1 1 2 2 5 5 5 5 8 8 5 5

Fig.6: A Zonal maximum function identifies the maximum of each zone to produce the output grid

Typical zonal functions include zonal mean, zonal standard deviation, zonal sum, zonal minimum, zonal maximum, zonal range, and zonal variety. Other statistical and geometric properties may also be derived from additional zonal

54

functions. For instance, the zonal perimeter function calculates the perimeter of each zone and assigns the returned value to each cell of the zone in the output grid. Global Functions: For global functions, the output value of each cell is a function of the entire grid. As an example, the Euclidean distance function computes the distance from each cell to the nearest source cell, where source cells are defined in an input grid. In a square grid, the distance between two orthogonal neighbors is equal to the size of a cell, or the distance between the centroid locations of adjacent cells. Likewise, the distance between two diagonal neighbors is equal to the cell size multiplied by the square root of 2. Distance between non-adjacent cells can be computed according to their row and column addresses. In figure 7 , the grid at the left is the source grid in which two clusters of source cells exist. The source cells labeled 1 are the first cluster, and the cell labeled 2 is a single-cell source. The Euclidean distance from any source cell is always equal to 0. For any other cell, the output value is the distance from its nearest source cell.

Source Grid Output Grid

1 1 Euclidean distance =

2.0

1.0

0.0

0.0

1 1.4

1.0

1.0

0.0

2 1.0

0.0

1.0

1.0

1.4

1.0

1.4

2.0

Fig.7: A Euclidean distance function computes the distance from the nearest source cell

In the above example, the measurement of the distance from any cell must include the entire source grid; therefore this analytical procedure is a global function. Figure 8 provides an example of the cost distance function. The source grid is identical to that in the preceding illustration. However, this time a cost grid is employed to weigh travel cost. The value in each cell of the cost grid indicates the cost for travelling through that cell. Thus, the cost for travelling from the cell located in the first row, second column to its adjacent source cell to the right is half the cost of travelling through itself plus half the cost of travelling through the neighboring cell.

55

Source Grid Cost Grid Output Grid

1 1

2 2 4 4 =

5.0

3.0

0 0

1 4 4 3 3 3.5

2.5

2.8

0

2 2 1 4 1 1.5

0 2.5

2.0

2 5 3 3 2.1

3.0

2.8

4.0

Fig.8: Travel cost for each cell is derived from the distance to the nearest source cell weighted by a cost function

Another useful global function is the cost path function, which identifies the least cost path from each selected cell to its nearest source cell in terms of cost distance. These global functions are particularly useful for evaluating the connectivity of a landscape and the proximity of a cell to any given entities. SOME IMPORTANT RASTER ANALYSIS OPERATIONS

In this section some of the important raster based analysis are dealt:

Renumbering Areas in a Grid File Performing a Cost Surface Analysis Performing an Optimal Path Analysis Performing a Proximity Search

Area Numbering : Area Numbering assigns a unique attribute value to each area in a specified grid file. An area consists of two or more adjacent cells that have the same cell value or a single cell with no adjacent cell of the same value. To consider a group of cells with the same values beside each other, a cell must have a cell of the same value on at least one side of it horizontally or vertically (4-connectivity), or on at least one side horizontally, vertically, or diagonally (8-connectivity). Figuere 9 shows a simple example of area numbering.

56

Figure 9. Simple example of Area numbering with a bit map as input. The pixels which are connected are assigned the same code. Different results are obtained when only the horizontal and vertical neighbors are considered (4-connected) or whether all neighbors are considered (8-connected) One can renumber all of the areas in a grid, or you can renumber only those areas that have one or more specific values. If you renumber all of the areas, Area Number assigns a value of 1 to the first area located. It then assigns a value of 2 to the second area, and continues this reassignment method until all of the areas are renumbered. When you renumber areas that contain a specified value (such as 13), the first such area is assigned the maximum grid value plus 1. For example, if the maximum grid value is 25, Area Number assigns a value of 26 to the first area, a value of 27 to the second area, and continues until all of the areas that contain the specified values are renumbered. Cost Surface Analysis : Cost Surface generates a grid in which each grid cell represents the cost to travel to that grid cell from the nearest of one or more start locations. The cost of traveling to a given cell is determined from a weight grid file. Zero Weights option uses attribute values of 0 as the start locations. The By Row/Column option uses the specified row and column location as the start location.

Input Map Result 4-

Result 8-

57

You can then use Optimal Path to calculate the best path between two points. The user-type code of the output file is 0 (generic grid file) and the data-type code is based on the range of values. Cost Surface interprets negative, void, zero and positive values in the input weights file as described here : Void values and values less than 0: These values are assigned the maximum cost surface value, which is 255 for a byte file, 65, 535 for a 2 byte file, or 4, 294, 967, 295 for a 4-byte file. All void and negative input values are considered “masked off”. The execution time increases with the number of masked values. Zero values : Zero values are considered to be start locations. When Cost Surface is executed, the software begins at all start locations (namely, all 0 values and, if specified, the user-defined start location). Positive Values : These values are interpreted as the cost of traveling across each cell. Example : A developer wants to estimate the cost of laying a pipeline across an area that includes residential, urban, commercial, and bedrock areas. The landcover grid file is shown in the following figure 10 :

Columns Input Grid File Rows

1 2 3 4 5 6 7 8 9 10

1 5 13 1 1 13 13 11 3 5 7 2 5 5 5 5 13 3 3 3 3 7 3 3 5 15 15 3 11 1 1 11 5 4 11 13 15 15 3 13 1 1 11 11 5 3 3 13 5 5 11 3 15 3 3 6 3 3 13 13 7 7 5 5 3 1 7 1 15 5 5 5 5 15 15 5 5 8 13 7 15 15 11 11 15 15 7 7 9 15 15 5 7 15 7 13 13 9 9 10 15 15 13 5 9 9 13 13 9 15

1 = Transportation 3 = Agriculture 5 = Forest 7 = Water 9 = Residential 11 = Urban 13 = Commercial 15 = Bedrock

58

The developer used Overlay to create the following weights file. Columns Weights file

Rows 1 2 3 4 5 6 7 8 9 10 1 30 200

0 10 10 200

0 200

0 200

0 20 30 40

2 30 30 30 30 2000

20 20 20 20 40

3 20 30 2000

2000

20 2000

10 10 2000

40

4 2000

2000

2000

2000

20 2000

10 10 2000

2000

5 20 20 2000

30 30 2000

20 2000

20 20

6 20 20 2000

2000

40 40 30 30 20 10

7 10 2000

30 30 30 30 2000

2000

30 30

8 2000

40 2000

2000

2000

2000

30 2000

40 40

9 2000

2000

30 40 2000

40 2000

2000

2000

2000

10 2000

2000

2000

30 2000

2000

2000

2000

2000

2000

Figure 10: Example of Cost surface analysis

Where, 1 is weighted to 10 3 is weighted to 20 5 is weighted to 30 7 is weighted to 40 9,11,13, 15 are weighted to 2000 The developer then input this weights file to Cost Surface, using Row 1/Column 1 as the start location to generate the following cost surface grid file (Fig. 11). The increasing values in the cost surface file represent the cumulative cost for lying the pipeline.

59

Columns Cost Surface Grid

Rows 1 2 3 4 5 6 7 8 9 10 1 0 1015 71 81 108

6 1173 1193 211 236 258

2 30 43 73 99 1114 163 183 203 216 246 3 55 73 108

8 1114 135 1145 184 194 1199 251

4 1065

1088

1508

1165 155 1165 194 198 1203

1250

5 358 350 1205

190 180 1195 209 1203

220 240

6 350 330 1309

1205

215 229 243 245 240 241

7 351 1309

294 264 250 264 1249

1260

265 261

8 1356

344 1309

1279

1265

1279

307 1320

300 296

9 1786

1364

393 428 1376

356 1322

1742

1320

1316

10 3786

1829

1408

436 1451 1376

1799

3742

3320

3316

Figure 11: Example of Cost surface analysis Note : Cost is measured from the center of one cell to the center of the next cell. Thus, for (1,1 ) to (1,2) add 15 (half of 30) to 1000 (half of 2000)and the result is 1015. Optimal Path: Optimal Path lets us analyze a grid file to find the best path between a specified location and the closet start location as used in generating a cost surface. The computation is based on a cost surface file that you generate with Cost Surface. One must specify the start location by row and column. The zeros in the input cost surface represent one endpoint. The specified start location represents the other endpoint. The path is generated by testing the values of neighboring cells for the smallest value. When the smallest value is found, the path moves to that location, where it repeats the process to move the next cell. The output is the path of least resistance between two points, with the least expensive, but not necessarily the

60

straightest, line between two endpoints. The output file consists of only the output path attribute value, which can be optionally specified, surrounded by void values. EXAMPLE: A COMPANY THAT IS PLANNING TO CONSTRUCT A PIPELINE THROUGH AN URBANIZED AREA IS INTERESTED IN FINDING THE BEST ROUTE FOR THE PIPELINE. THEY PLAN TO DO THIS USING A COST SURFACE FILE (SHOWN BELOW) CREATED FOR THIS PROJECT.

Columns Cost Surface Grid Rows 1 2 3 4 5 6 7 8 9 10

1 0 1015 71 81 1086 1173 1193 211 236 258 2 30 43 73 99 1114 163 183 203 216 246 3 55 73 1088 1114 135 1145 184 194 1199 251 4 1065 1088 1508 1165 155 1165 194 198 1203 1250 5 358 350 1205 190 180 1195 209 1203 220 240 6 350 330 1309 1205 215 229 243 245 240 241 7 351 1309 294 264 250 264 1249 1260 265 261 8 1356 344 1309 1279 1265 1279 307 1320 300 296 9 1786 1364 393 428 1376 356 1322 1742 1320 1316 10 378

6 1829 1408 436 1451 1376 1799 374

2 332

0 3316

From the cost surface grid, the optimal path from row 10, column 10 to row 1, column 1 was found, as shown in the following grid. Columns Optimal Path Grid Rows 1 2 3 4 5 6 7 8 9 10

1 1 VOID 1 VOID VOID VOID VOID VOID VOID VOID 2 VOID 1 VOID 1 VOID 1 VOID VOID VOID VOID 3 VOID VOID VOID VOID 1 VOID 1 VOID VOID VOID 4 VOID VOID VOID VOID VOID VOID VOID 1 VOID VOID 5 VOID VOID VOID VOID VOID VOID VOID VOID 1 VOID 6 VOID VOID VOID VOID VOID VOID VOID VOID 1 VOID 7 VOID VOID VOID VOID VOID VOID VOID VOID VOID 1 8 VOID VOID VOID VOID VOID VOID VOID VOID VOID 1 9 VOID VOID VOID VOID VOID VOID VOID VOID VOID 1 10 VOID VOID VOID VOID VOID VOID VOID VOID VOID 1

Figure 12 Example of Optimal Path Analysis In this grid, the 1 values represent the best, or optimal, path to construct the pipeline between row 1, column 1 and row 10, column10. The VOID values are background values (-128 for byte, -32768 for short, and –2147483648 for long files).

61

Performing A Proximity Search: Proximity lets you search a grid file for all the occurrences of a cell value or a feature within either a specified distance or a specified number of cells from the origin. You can set both the origin and the target to a single value or a set of values. The number of cells to find can also be limited. For example, if you specify to find 10 cells, the search stops when 10 occurrences of the cell have been found within the specified distance of each origin value. If you do not limit the number of cells, the search continues until all target values are located. The output grid file has the user-type code and the data-type code of the input file. The gird-cell values in the output file indicate whether the grid cell corresponds to an origin value, the value searched for and located within the specified target, or neither of these. The origin and target values may be retained as the original values or specified to be another value. Example: In the following example, a land speculator wants to begin building on cleared land, but county regulations require that new buildings to located within 1.5 miles of a fire hydrant. The speculator’s holdings have been mapped on the following grid :

Columns Input Grid File Rows 1 2 3 4 5 6 7 8 9 10

1 4 4 3 3 3 3 3 3 3 3 2 4 4 10 3 3 4 3 3 3 3 3 3 3 3 3 3 4 4 10 4 5 4 5 3 3 4 4 4 5 5 4 5 5 5 5 3 5 4 4 4 3 3 3 6 5 3 3 5 4 4 4 3 3 3 7 5 5 4 4 10 4 4 3 3 4 8 5 5 4 5 4 4 4 4 4 4 9 5 10 5 4 4 3 3 3 3 3 10 5 5 4 4 4 4 3 3 3 3

Figure 13 Each Cell = 1 square mile

3 = Dry Brush 4 = Mixed Timber 5 = Cleared Land 10 = Fire Hydrant Site

62

Because the speculator does not want to install new fire hydrants or clear more land, the following input and output values were specified. Origin Value-10, Feature-Fire Hydrant Site, Target Value-5, Feature - Cleared Land Distance - 1.5 miles, Background- 128, Value (Void for a signed byte file

The following output file was generated :

Columns Output Grid File Rows 1 2 3 4 5 6 7 8 9 10

1 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 2 -128 -128 10 -128 -128 -128 -128 -128 -128 -128 3 -128 -128 -128 -128 -128 -128 -128 10 -128 -128 4 -128 -128 -128 -128 -128 -128 5 5 -128 -128 5 -128 -128 -128 -128 -128 -128 -128 -128 -128 -128 6 -128 -128 -128 5 -128 -128 -128 -128 -128 -128 7 -128 -128 -128 -128 10 -128 -128 -128 -128 -128 8 5 5 -128 5 -128 -128 -128 -128 -128 -128 9 5 10 5 -128 -128 -128 -128 -128 -128 -128 10 5 5 -128 -128 -128 -128 -128 -128 -128 -128

Figure 14: Example of Optimal Search Analysis

Each cell = 1 square mile 5 = Cleared land within 1.5 miles of a fire hydrant 10 = Fire Hydrant site -128 = Other

Cells containing the value 5, Cleared land, are those which meet all of the criteria and thus represent the most appropriate sites. GRID-BASED SPATIAL ANALYSIS Diffusion modeling and connectivity analysis can be effectively conducted from grid data. Grid analysis is suitable for these types of problems because of the grid’s regular spatial configuration of geographic units. Diffusion Modelling: It deals with the process underlying spatial distribution. The constant distance between adjacent units makes it possible to simulate the progression over geographic units at a consistent rate. Diffusion modelling has a variety of possible applications, including wildfire management, disease vector tracking, migration studies, and innovation diffusion research, among others.

63

Connectivity Analysis: Connectivity analysis evaluates interseparation distance, which is difficult to calculate in a polygon coverage, but can be obtained much more effectively in a grid.

The connectivity of a landscape measures the degree to which surface features of a certain type are connected. Landscape connectivity is an important concern in environmental management. In some cases, effective management of natural resources requires maximum connectivity of specific features. For instance, a sufficiently large area of dense forests must be well connected to provide a habitat for some endangered species to survive. In such cases, forest management policies must be set to maintain the highest possible level to connectivity. Connectivity analysis is especially useful for natural resource and environmental management. REFERENCES: Exploring Spatial Analysis in Geographical Information Systems by Yue Hong

Chou Fundamentals of Spatial by Robert Laurini & Derek Thompson. Understanding GIS the ARC/INFO Method Ver7.1. ILWIS 2.2 User Guide George J.Klir & Bo Yuan- “Fuzzy sets and fuzzy logic” Lefteri H.Tsoukalas & Robert E.Uhrig- “Fuzzy and Neural Approaches in

Engineering” P.A.Burrough- “Principles of Geographical Information System for Land

Assessment”

CONCEPTUAL MODEL OF SPATIAL INFORMATION - …jkmandal.com/pdf/gis_data_model.pdf · spatial data...

Documents

Transcript of CONCEPTUAL MODEL OF SPATIAL INFORMATION - …jkmandal.com/pdf/gis_data_model.pdf · spatial data...