Representation of spatial data
GIS architecture, raster and vector, conversion, administrative subdivisions: polygon-ring, topology, extended DCEL, continuous data: contours, DEMs, TINs
Thematic map layers
• Separate storage of data according to theme: map layers
• GIS typically use tens to hundreds of map layers
• For example: municipality borders, land use, cadastral boundaries, water pipes, churches, etc.
Example map layers
Census data, 1995(U.S.A.)
Geometry, topology and attributes
• Geometry: coordinates• Topology: adjacency relations of objects• Attributes: properties, values
Example: Country map of South America
Geometry: coordinates of the bordersTopology: which countries border which Attributes: names of countries, population, etc.
GIS data architectures
• Structure/philosophy of how geometrical/ topological and attribute information should be stored and is accessible
1. Twofold or dual architecture2. Layered architecture3. Integrated architecture
Pure database approach
Geometry and attributes in same relational data model
+ More concurrent users possible - Objects must be reduced to atomic parts, and
partitioned over various tables - Retrieving original objects is expensive (join) - Query language doesn’t know spatial concepts
(area, intersects, …)
Twofold architecture
• Attributes in a DBMS• Geometry in separate files (by theme)• Connection by unique identifier• Two subsystems:
the DBMS and onefor the geometry
E.g. ArcGIS
Twofold: pros and cons
+ DB generally known (at organizations)+ Geometry fast and easily accessible - More users difficult to incorporate concurrently - Maintaining consistency between 2 systems
is tricky - Efficient transactions (optimizations) tricky
because of two systems
Layered architecture
• RDBMS with additional, geographically intelligent layer
• Layer contains extensionwith geographical datatypes, e.g. Point,pointcluster, polygon, line
• Layer offers extension toquery language, andtranslates for the actualRDBMS
Layered: pros and cons
+ More concurrent users possible+ Spatial object types and concepts are present - Middle layer not extendible - Topological relations must be determined
when they are needed - Practice: no object type for subdivision
Integrated architecture
• Spatial object types and functions in the database itself
• RDBMS or OO
E.g. Postgres
Integrated: pros and cons
+ No translation in middle layer necessary+ Extendible with additional types and functions - Extension is rather complex - Practice: less GIS-functionality present by
default
Representation of geometry
• Two main approaches: raster and vector
• Can also be mixed in a GIS, any map layer
• Conversion raster-vector and vice versa possible
• Representation depends on type of data, way of acquisition, desired operations, etc.
Raster structure
• Division of space into equal-size cells (squares, pixels)
• Theme gives cells a value (nominal, ordinal, interval, ratio, vector, …)
• Cells should not contain any further spatial information (more detail)
Data in raster form
Point object inraster form
Line object inraster form
Plane object inraster form
Raster maps
Raster: pros and cons
• Simple structure• Simple operations• Obtained after scanning,
remote sensing
• Less suitable for point and line objects: representation does not follow intuition
• Network analysis difficult• Not adaptive: no difference
in detail possible in different regions
• Either expensive in memory, or little precision
• Not obtained after digitizing
Raster: memory reduction
• Run-length encoding: no 2-dim array but coding start pixel with value and length of run
• Block encoding: 2-dim version• Disadvantage: makes structure and operations
much more complex
(34,67) forest 9(34,67) forest 4,6
Vector structure
• Objects stored as points, lines and areas• Points have coordinates; lines connect points;
areas are delimited by lines• Attributes are stored with the objects (point, line
or areal)
Vector: pros and cons
• Elegant structure; fits with both point, line and areal objects
• Small storage consumption • Precise• Adaptive: additional
control points possible• Network and cluster
analysis possible• Obtained after digitizing
• Relatively complex• Map overlay and buffer
computation complex
Vector representation of a region
• Not necessarily simply-connected:– NL has islands– NL has holes
(Baarle-Nassau / Baarle-Hertog); there are even regions in these holes
Representation of subdivisions
Subdivisions: spaghetti model
• Every chain is represented by a list with coordinate pairs
• Split nodes are doubly stored
• Areas are not present explicitly
C1: (..,..), (..,..), (..,..), ...
C2: (..,..), (..,..), (..,..), ...
C3: (..,..), (..,..), (..,..), ...
C1
C2
C3
C4
C5
C6
Subdivisions: polygon ring structure
• Every area is represented by a list with coordinate pairs
• Control points are doubly stored
• Neighbor areas are difficult to determine
• Consistency is difficult to maintain
P1
P2
P3
P1: (..,..), (..,..), (..,..), ...
P2: (..,..), (..,..), (..,..), ...
P3: (..,..), (..,..), (..,..), ...
Subdivisions: topological structure
• Nodes are objects with coordinates
• Edges are connections of nodes
• Sequences of edges along polygon boundaries are connected
• Polygons are objects of which the boundary is stored
Doubly-connected edge list
Subdivisions: topological chain structure
• Splitting nodes are objects with coordinates
• Chains are connections of splitting nodes and contain zero or more nodes with coordinates
• Sequences of chains along polygon boundaries are connected
• Polygons are objects of which the boundary is stored
Doubly-connected chain list
Vector structures
Spaghetti ++ + -- -
Polygon ring - -- ++ -
DC edge list -- ++ - +
DC chain list ++ ++ + ++
Memory Duplication Polygon Topologyretrieve retrieve
Raster-vector conversion
• Vector-to-raster: Like in computer graphics: scan-conversion of lines, etc.
• Raster-to-vector: Consider pixel sides between pixels with different values as boundary and put in vector representation Thinning, line simplification
E.g. for data integration
Thinning
Raster-vectorconversion
Thinning
Line simplification
• Douglas-Peucker algorithm from 1973• Input: chain p1, …, pn and error
p1
pn
DP-algorithm
• Draw line segment between first and last point• If all points in between are within error: ready• Otherwise, determine farthest point and recursively continue on the part until farthest point and the part after farthest point
DP-algorithm
DP-standard(i, j, )
Determine farthest point pk between pi and pj
If distance(pk, pi pj) > then DP-standard(i, k, ) DP-standard(k, j, ) Return the concatenation of the simplifications
Properties of the DP-algorithm
• DP-algorithm does not minimize the number of points in the simplification
DP-algorithm Optimal
Properties of the DP-algorithm
• Determining farthest point takes O(n) time• Whole algorithm takes
T(n) = T(m) + T(n-m+1) + O(n),T(2) = O(1) time,
splitting in m and n-m+1 points
• “Fair” split gives O(n log n) time• Worst case gives quadratic time
Properties of the DP-algorithm
• DP-algorithm may give self-intersections in the output
Solution: test output for self-intersectionsand continue adding control points if necessary
Improved DP-algorithm
DP-improved(i, j, )
Simp = DP-standard(i, j, )V = set of intersecting segments of SimpRepeat
For all segments s V Refine(s) in SimpDo 1 refinement à la DP by adding the
farthest point V = set of intersecting segments of SimpUntil V is empty
Continuous data representation
• Data on interval or ratio measurement scale• Data values of points near by will usually be not
very different• Representation is necessarily an approximation:
finite representation of information with infinite detail
• Raster (1x) or vector (2x)
Digital Elevation Model (DEM)
Elevation models
(Elevation) grid
21
21
20
2019
2015
10
10
25
Contour line model
Triangulation(TIN; triangulatedirregular network)
Raster Vector Vector
Grid elevation model
TIN elevation model
Elevation models
• Contour model well-suited for visualisation, not for representation or storage
• Interpretations grid:- elevation whole cel: not a continuous model- elevation middle cel: interpolation needed; how?
• Advantage grid: simple storage, operations simple too
• Advantage TIN: more efficient in storage, adaptive
Interpolation for grid
20 18
2218
(20+18+18+22) / 4 =19.5
Linear interpolation; saddle point problem
20 18
2218
20 18
2218
20 18
2218
Linear interpolation;additional point
20 18
2218
Non-linearinterpolation
Topological TIN structure
t
t1
t2t3
u v w
x, y-coordinates and elevation
• With explicit vertex and triangle representation
t
t1 t2
t3
u v
w
Topological TIN structure
t
t1
t2t3
u v w
Because t1 has pointers to two the same vertices as t, we can determine their shared edge, even though it is not represented explicitly
• With explicit vertex and triangle representation
t
t1 t2
t3
u v
w
Topological TIN structure
t
t1 t2
t3
u v
w
• With explicit vertex and triangle representation
t
t1 t2
t3
u v
w
Topological TIN structure
• Alternatively, edges have an explicit representation too
e1 e2e3
t
e1 e2
e3
t1w
u
tt1 t2
t3
u v
w
Summary representation
• Objects have geometry and attributes, at least the attributes are in a database
• Geometry can be stored in raster or vector form; each has advantages and disadvantages
• Important geometric types of representations are those for subdivisions and for elevation models
• For subdivisions, the doubly-connected chain list is the most suitable structure
• For elevation models, grids or TINs are most useful
Top Related