Download - Representation of spatial data GIS architecture, raster and vector, conversion, administrative subdivisions: polygon-ring, topology, extended DCEL, continuous.

Representation of spatial data

GIS architecture, raster and vector, conversion, administrative subdivisions: polygon-ring, topology, extended DCEL, continuous data: contours, DEMs, TINs

Thematic map layers

• Separate storage of data according to theme: map layers

• GIS typically use tens to hundreds of map layers

• For example: municipality borders, land use, cadastral boundaries, water pipes, churches, etc.

Example map layers

Census data, 1995(U.S.A.)

Geometry, topology and attributes

• Geometry: coordinates• Topology: adjacency relations of objects• Attributes: properties, values

Example: Country map of South America

Geometry: coordinates of the bordersTopology: which countries border which Attributes: names of countries, population, etc.

GIS data architectures

• Structure/philosophy of how geometrical/ topological and attribute information should be stored and is accessible

1. Twofold or dual architecture2. Layered architecture3. Integrated architecture

Pure database approach

Geometry and attributes in same relational data model

+ More concurrent users possible - Objects must be reduced to atomic parts, and

partitioned over various tables - Retrieving original objects is expensive (join) - Query language doesn’t know spatial concepts

(area, intersects, …)

Twofold architecture

• Attributes in a DBMS• Geometry in separate files (by theme)• Connection by unique identifier• Two subsystems:

the DBMS and onefor the geometry

E.g. ArcGIS

Twofold: pros and cons

+ DB generally known (at organizations)+ Geometry fast and easily accessible - More users difficult to incorporate concurrently - Maintaining consistency between 2 systems

is tricky - Efficient transactions (optimizations) tricky

because of two systems

Layered architecture

• RDBMS with additional, geographically intelligent layer

• Layer contains extensionwith geographical datatypes, e.g. Point,pointcluster, polygon, line

• Layer offers extension toquery language, andtranslates for the actualRDBMS

Layered: pros and cons

+ More concurrent users possible+ Spatial object types and concepts are present - Middle layer not extendible - Topological relations must be determined

when they are needed - Practice: no object type for subdivision

Integrated architecture

• Spatial object types and functions in the database itself

• RDBMS or OO

E.g. Postgres

Integrated: pros and cons

+ No translation in middle layer necessary+ Extendible with additional types and functions - Extension is rather complex - Practice: less GIS-functionality present by

default

Representation of geometry

• Two main approaches: raster and vector

• Can also be mixed in a GIS, any map layer

• Conversion raster-vector and vice versa possible

• Representation depends on type of data, way of acquisition, desired operations, etc.

Raster structure

• Division of space into equal-size cells (squares, pixels)

• Theme gives cells a value (nominal, ordinal, interval, ratio, vector, …)

• Cells should not contain any further spatial information (more detail)

Data in raster form

Point object inraster form

Line object inraster form

Plane object inraster form

Raster maps

Raster: pros and cons

• Simple structure• Simple operations• Obtained after scanning,

remote sensing

• Less suitable for point and line objects: representation does not follow intuition

• Network analysis difficult• Not adaptive: no difference

in detail possible in different regions

• Either expensive in memory, or little precision

• Not obtained after digitizing

Raster: memory reduction

• Run-length encoding: no 2-dim array but coding start pixel with value and length of run

• Block encoding: 2-dim version• Disadvantage: makes structure and operations

much more complex

(34,67) forest 9(34,67) forest 4,6

Vector structure

• Objects stored as points, lines and areas• Points have coordinates; lines connect points;

areas are delimited by lines• Attributes are stored with the objects (point, line

or areal)

Vector: pros and cons

• Elegant structure; fits with both point, line and areal objects

• Small storage consumption • Precise• Adaptive: additional

control points possible• Network and cluster

analysis possible• Obtained after digitizing

• Relatively complex• Map overlay and buffer

computation complex

Vector representation of a region

• Not necessarily simply-connected:– NL has islands– NL has holes

(Baarle-Nassau / Baarle-Hertog); there are even regions in these holes

Representation of subdivisions

Subdivisions: spaghetti model

• Every chain is represented by a list with coordinate pairs

• Split nodes are doubly stored

• Areas are not present explicitly

C1: (..,..), (..,..), (..,..), ...

C2: (..,..), (..,..), (..,..), ...

C3: (..,..), (..,..), (..,..), ...

C1

C2

C3

C4

C5

C6

Subdivisions: polygon ring structure

• Every area is represented by a list with coordinate pairs

• Control points are doubly stored

• Neighbor areas are difficult to determine

• Consistency is difficult to maintain

P1

P2

P3

P1: (..,..), (..,..), (..,..), ...

P2: (..,..), (..,..), (..,..), ...

P3: (..,..), (..,..), (..,..), ...

Subdivisions: topological structure

• Nodes are objects with coordinates

• Edges are connections of nodes

• Sequences of edges along polygon boundaries are connected

• Polygons are objects of which the boundary is stored

Doubly-connected edge list

Subdivisions: topological chain structure

• Splitting nodes are objects with coordinates

• Chains are connections of splitting nodes and contain zero or more nodes with coordinates

• Sequences of chains along polygon boundaries are connected

• Polygons are objects of which the boundary is stored

Doubly-connected chain list

Vector structures

Spaghetti ++ + -- -

Polygon ring - -- ++ -

DC edge list -- ++ - +

DC chain list ++ ++ + ++

Memory Duplication Polygon Topologyretrieve retrieve

Raster-vector conversion

• Vector-to-raster: Like in computer graphics: scan-conversion of lines, etc.

• Raster-to-vector: Consider pixel sides between pixels with different values as boundary and put in vector representation Thinning, line simplification

E.g. for data integration

Thinning

Raster-vectorconversion

Thinning

Line simplification

• Douglas-Peucker algorithm from 1973• Input: chain p1, …, pn and error

p1

pn

DP-algorithm

• Draw line segment between first and last point• If all points in between are within error: ready• Otherwise, determine farthest point and recursively continue on the part until farthest point and the part after farthest point

DP-algorithm

DP-standard(i, j, )

Determine farthest point pk between pi and pj

If distance(pk, pi pj) > then DP-standard(i, k, ) DP-standard(k, j, ) Return the concatenation of the simplifications

Properties of the DP-algorithm

• DP-algorithm does not minimize the number of points in the simplification

DP-algorithm Optimal


• Determining farthest point takes O(n) time• Whole algorithm takes

T(n) = T(m) + T(n-m+1) + O(n),T(2) = O(1) time,

splitting in m and n-m+1 points

• “Fair” split gives O(n log n) time• Worst case gives quadratic time


• DP-algorithm may give self-intersections in the output

Solution: test output for self-intersectionsand continue adding control points if necessary

Improved DP-algorithm

DP-improved(i, j, )

Simp = DP-standard(i, j, )V = set of intersecting segments of SimpRepeat

For all segments s V Refine(s) in SimpDo 1 refinement à la DP by adding the

farthest point V = set of intersecting segments of SimpUntil V is empty

Continuous data representation

• Data on interval or ratio measurement scale• Data values of points near by will usually be not

very different• Representation is necessarily an approximation:

finite representation of information with infinite detail

• Raster (1x) or vector (2x)

Digital Elevation Model (DEM)

Elevation models

(Elevation) grid

21

21

20

2019

2015

10

10

25

Contour line model

Triangulation(TIN; triangulatedirregular network)

Raster Vector Vector

Grid elevation model

TIN elevation model

Elevation models

• Contour model well-suited for visualisation, not for representation or storage

• Interpretations grid:- elevation whole cel: not a continuous model- elevation middle cel: interpolation needed; how?

• Advantage grid: simple storage, operations simple too

• Advantage TIN: more efficient in storage, adaptive

Interpolation for grid

20 18

2218

(20+18+18+22) / 4 =19.5

Linear interpolation; saddle point problem

20 18

2218

20 18

2218

20 18

2218

Linear interpolation;additional point

20 18

2218

Non-linearinterpolation

Topological TIN structure

t

t1

t2t3

u v w

x, y-coordinates and elevation

• With explicit vertex and triangle representation

t

t1 t2

t3

u v

w


t

t1

t2t3

u v w

Because t1 has pointers to two the same vertices as t, we can determine their shared edge, even though it is not represented explicitly


t

t1 t2

t3

u v

w


t

t1 t2

t3

u v

w


t

t1 t2

t3

u v

w


• Alternatively, edges have an explicit representation too

e1 e2e3

t

e1 e2

e3

t1w

u

tt1 t2

t3

u v

w

Summary representation

• Objects have geometry and attributes, at least the attributes are in a database

• Geometry can be stored in raster or vector form; each has advantages and disadvantages

• Important geometric types of representations are those for subdivisions and for elevation models

• For subdivisions, the doubly-connected chain list is the most suitable structure

• For elevation models, grids or TINs are most useful