Stefan Falke [email protected] Stefan Falke [email protected] An Overview of Spatial Data...
-
Upload
kelley-atkinson -
Category
Documents
-
view
256 -
download
0
Transcript of Stefan Falke [email protected] Stefan Falke [email protected] An Overview of Spatial Data...
Stefan Falke
Stefan Falke
An Overview of Spatial Data AnalysisAn Overview of Spatial Data Analysis
http://capita.wustl.edu/ENVE424/REU/SpatialAnalysis.htm
Pop vs Soda vs Coke
http://www.popvssoda.com/
Pop vs Soda vs Coke by County
2000 Presidential Election Results
Bush
Gore
votes: 50,456,169
votes: 50,996,116
States: 30
States: 21
2000 Presidential Election Results by County
Bush
Gore
Environmental Pattern and Trend Analysis
When analyzing environmental data we examine:
We are particularly interested in changes in these patterns and trends and relationships with other patterns and trends
The analysis also strives to determine why we see these patterns and trends – what are the casual factors and what are their impacts.
Temporal TrendsSpatial Patterns
Spatial and Temporal Data Analysis
Turns raw data into useful information by adding greater informative content and value
Wisdom
Knowledge /Evidence
Information
Data
What is Spatial Data Analysis?
Spatial analysis is the quantitative and qualitative study of phenomena that are located in space.
Environmental spatial data analysis describes characteristics and behavior of the environment
Explores patterns, trends, and relationships in environmental data Seeks to explain these patterns, trends, and relationships
Differs from general data analysis and statistics in that: Spatial data are dependent on location and related by location (they do
not adhere to the independence assumption made in regular data analysis)
Have properties that require special analysis methods
about 85% of environmental data is spatial
Why is spatial analysis such a big deal?
What is GIS?
Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data.
‘GIS’ is Geographical Information System OR IS IT
Geographical Information Science?
GISystems: Emphasis on technology and toolsGIScience: Fundamental issues raised by the use of GIS, such as
Spatial analysisMap projectionsAccuracyScientific visualization
Implementation and application of GIS covers a wide spectrum:
Simple mapsOverlaying multiple map “layers”Conducting proximity or cluster analysis based on distanceComparing data sets (simple spatial statistics)Complex statistical analysis
NatureVol 42722 January 2004
Special Spatial Nomenclature
Geographic – Limited to phenomena and problems relating to Earth’s surface and near-surface
Spatial – Any space, including geographic, but not restricted to geographic coordinate space, e.g. medical imaging
Geospatial – A recent term to represent the subset of spatial applied specifically to the Earth’s surface. (synonymous with geographic)
http://labs.google.com/location
Tobler’s First Law of Geography
“Everything is related to everything else, but near things are more related than distant things.”
Tobler, 1970
This general assumption is what subjects spatial data subject to special statistical laws
Types of Spatial Analysis
There are literally thousands of techniques Bailey and Gatrell, 1995 offer four spatial data analysis classes:
Point Data Analysis• Do the locations of point data and the relationship among the points represent a
‘significant’ pattern
Continuous Data Analysis• What are the spatial pattern and characteristics over a region given a set of samples
Area Data Analysis• Analysis of data that have been aggregated over a spatial zone, e.g. county
The John Snow Map
A classic example of the use of location to draw inferences
1854 cholera outbreak in London
Point data map indicated some spatial clustering
Overlaying a map of water pump locations showed many cases were concentrated around a single pump
Continuous Data Analysis
Temperature data is well suited for converting from point to continuous data
- It has high spatial density
- Ambient temperature is relatively spatially homogenous (no sharp gradients)
County Level Aggregated Data
Also known as a chloropleth plot
Scale
The most appropriate analysis method to use depends on the spatial and temporal scales of the problem.
The spatial variability of temperature at a ‘local’ scale is not necessarily significant when conducting an analysis over at the ‘regional’ or ‘global’ scale.
Scale Dependent Measurements
How long is Maine’s coastline?
length=340 km
length=355 km
length=415 km
From Longley et al., 2001
What’s in a map, anyway?
Theme: Static map Maps of entities whose location is known and constant (relatively) Roads, borders, locations of buildings These types of layers are often referred to as “thematic” layers Are usually used to provide context to other spatial data
Statistical: Realization of one of the many possible patterns that may have been generated by a process Given a set of conditions, a given spatial pattern is just one
instance among a distribution of possible patterns The question is: Is the observed realization significantly different
than what would be expected by chance?
Deterministic versus Stochastic Processes
Deterministic processes have one realization: the value at a given location is always the same, regardless of the number of times the process is occurs
Stochastic processes have multiple realizations that are not precisely predicted and involve a random component.
For our purposes, random refers to the method used to generate a pattern not the resulting pattern itself.
Examples of Deterministic & Stochastic Processes
yxz 32 Deterministic
dyxz 32
Stochastic
random variable
1d
Random Spatial Processes
A random process does not mean that all events are independent of one another, as is the case with flipping a coin or rolling dice.
Rather, spatial random processes are random with dependence (or rules).
Consider a “conditionally” random display of 4 coins:
Flip the first 3 coins and display by their flipped side (head or tails)
The 4th coin will not be flipped
The 4th coin is displayed as follows:If the 2nd and 3rd flipped coins are heads, the 4th is the same as the firstOtherwise, the 4th is opposite of the first.
Basic Statistical Concepts
Frequency/Probability Distributions
0
1
2
3
4
5
6
1 2 3 4 5 6 7
Bin
Freq
uenc
y
n
a
a
n
ii
1Mean: Variance: 2
1
2
1
1
n
ii aa
ns
0
1
2
3
4
5
6
7
8
9
1 2 3 4 5 6 7 8 9 10 11
Bin
Fre
qu
en
cy
Covariance:
Normal or Gaussian
Poisson
n
iii yyxx
nyxCov
1
1),(
mean=variancemean=median
Median: The value in the distribution at which 50% of the data points lie both above and below
Distribution Summary Statistics
Measures of Location
Measures of Spread
Measures of Shape
The features of a distribution can be summarized using:
• Mean• Median• Quantiles
• Standard Deviation = Square Root of Variance
• Coefficient of skewness – a measure of symmetry• Kurtosis – a measure of the likelihood of outliers
Complete Spatial Randomness
Take as an example a randomly generated point data set where
1) the chance of a given x,y point existing is equal to the chance any other point existing (uniform probability distribution)
2) the existence of a x,y point is independent of the existence of any other point
These two conditions constitute an independent random process (IRP) or complete spatial randomness (CSR)
Exploratory Spatial Data Analysis (ESDA)
Aim is to identify data properties for purposes of pattern detection
Based on the use of graphical and visual methods and the use of numerical techniques that are statistically robust i.e. not much affected by extreme or atypical data values.
ArcGIS Geostatistical Analyst extensioncontains a set of ESDA tools:
• Histogram (Frequency Distribution)• Voronoi Map• QQPlot• Trend Analysis
Exploratory Analysis Example
Summary Statistics
Quantile Plots
Graphs the quantiles of a dataset against the quantiles of a normal distribution
Vornoi Plot
Voronoi plots assign or calculate values to a point’s polygon.
Including:•value itself•mean of neighboring polygons•most frequent value among neighboring polygons•unique value among neighbors•variation among neighbors
Spatial Smoothing/Averaging
Data Types
Two general views to organizing spatial data: Entities or objects
• Point measurements, rivers, structures• Have attributes or features attached to them• Point, vector or area format• Values exist at discrete locations
Fields• Continuous data such as temperature gradient fields and satellite
imagery• Values exist over an area• Raster format (grids)
Data Types
Entities and fields can be transformed to the other type
Vector RepresentationX-AXIS
500
400
300
200
100
600500400300200100
Y-AXIS
River
House
600
Trees
Trees
BB
B BB
BBB G
GBK
BBB
G
G
G GG
Raster Representation
1 2 3 4 5 6 7 8 9 1012345
67
8910
Real World
G G
Raster and Vector Data Models
adapted from Lembo, 2003
Landcover Raster Grid
Legend
Mixed coniferDouglas fir
Oak savannahGrassland (1-5)
(6-10)
(11-15)
(16-20)
2 17
17
1616
151411
13 15
15 15
13
13
12
12
16
10
10
8
8
87
7
65
5
5
5
5
5
4
4
3
3
4
What is GIS?
Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data.
‘GIS’ is Geographical Information System OR IS IT
Geographical Information Science?
GISystems: Emphasis on technology and toolsGIScience: Fundamental issues raised by the use of GIS, such as
Spatial analysisMap projectionsAccuracyScientific visualization
Implementation and application of GIS covers a wide spectrum:
Simple mapsOverlaying multiple map “layers”Conducting proximity or cluster analysis based on distanceComparing data sets (simple spatial statistics)Complex statistical analysis
GIS Functionality
Filtering Retrieves a subset of a dataset Examples
• Query (search)
Aggregation Combines attributes or features within data sources (layers) Examples
• Reclassify, dissolve
Integration Combine two or more data sources (layers) Example
• Polygon overlay, table joining
Identifying features based on spatial criteriaCriteria include variations on:
adjacency, containment, arrangement, and connectivity
Spatial Queries (Filter)
Containment
Adjacency
Which states “contain” the Mississippi River and its tributaries?
Which states are adjacent to the State of Missouri?
Reclassification (Aggregation)
An assignment of a class or value based on the attributes or geography of an object
Reclassification & Dissolve
Variable Distance Buffering
Polygon Overlay (Integration)
Topology describes the relationships between elements of a map.
A topological data structure defines the elements of the map in a way that makes it possible to know which line segments are connected to each other and to know what polygon is adjacent to each side of a line segment.
“Cookie-cutter” method
Polygon Overlay Examples
Coordinate Systems
A geographical coordinate system uses a three-dimensional spherical surface to define locations on the earth.
Divides space into orderly structure of locations.
Two types: Cartesian and angular (spherical)
© Paul Bolstad, GIS Fundamentals
Parallels and MeridiansMeridians are great circles of constant longitude
Example is the prime meridian
latitude (φ): angular distance from equator
longitude (λ): angular distance from standard meridian
St. Louis 38° 39' N 90° 38' W New York 40° 47' N 73° 58' W Los Angeles 34° 3' N 118° 14' W Rome 41° 48' N 12° 36' E Sydney 33° 52' S 151° 12' E
Parallels are circles of constant latitude
Example is the equator
Earth’s Expanding Waistline
From the Chronicle of Higher Education Jan 17, 2003
DatumWhile a spheroid approximates the shape of the earth, a datum defines the position of the ellipsoid relative to the center of the Earth
The datum provides a frame of reference for measuring locations on the surface of the Earth
A datum is chosen to align a spheroid to closely fit the Earth’s surface in a particular area
Map Projections and Distortions
Equal area – the ratio of areas on the earth and on the map are constant. Shape, angle, and scale are distorted.
Conformal – the shape of any small surface of the map is preserved in its original form. If meridians and parallel lines are at 90-degree angles, then angles are also preserved.
Equidistant - preserve distances between certain points. Scale is not maintained correctly, however, typically one or more lines has its scale maintained.
Three general types of projections:
Comparing Projections
Summary Statistics of a Point Pattern
n
y
n
xs
n
ii
n
ii
yx11 ,),(
X
Mean center average of the x and y coordinates (geographic mean)
Standard Distance average distance of points from center (provides measure of dispersion)
Summary CircleCentered at mean center with a radius of the standard distance
n
yxd
n
iyixi
1
22 X
X
US Population Density
Geographic Center of US Population
n
ii
n
iii
w
ywy
1
1
n
iii
n
iiii
yw
yxwx
1
1
cos
cos
population :
latitude :
longitude :
w
y
x
The center of the US population is calculated as the average latitude and longitudes weighted by the population at a uniformly spaced set of points
Quadrant CountA quadrant count is conducted by superimposing a regular grid over data, counting the number of events in each grid cell and divide the count by its cell area to get intensity.
Variance:
2
1
2 1
n
i
kn
s
40 grid cells
Mean cell count
175.140
47
1444.240
775.85
825.1175.1
1444.2
mean
variance
A s2 to µ ratio greater than 1 indicates clustering
Positive Spatial Autocorrelation
Like values tend to cluster in space
Negative Spatial Autocorrelation
Neighbors are dissimilar
Zero Spatial Autocorrelation
No correlation
Spatial Autocorrelation
Defines the correlation between values of the same variable at different spatial locations
point monitoring
data
continuous surface of
estimates (map)
spatial estimation method
ci is the estimated value at location i
n is the number of data pointscj is the value at data point j
wij is the weight assigned to data point j
The factor that determines how much influence a data point is assigned during the calculation of the estimate
From points to fields
The weighting factor is usually the distinguishing feature of interpolation methods.
Biggest challenge: How to determine the weights?
j
kij
kij
ijd
dw
k is the power-law of distance weighting
Inverse Distance Interpolation
Constrained to the minimum and maximum values in point data set
Spatial Smoothing/Averaging
Landcover Raster Grid
Legend
Mixed coniferDouglas fir
Oak savannahGrassland (1-5)
(6-10)
(11-15)
(16-20)
2 17
17
1616
151411
13 15
15 15
13
13
12
12
16
10
10
8
8
87
7
65
5
5
5
5
5
4
4
3
3
4
Raster Analysis (Continuous Data)
2 3 5
2 3 6
3 5 7
2
minimum
7
5 4
maximum
range mean
Moving Windows
Slope is the change is elevation (rise) with a change in horizontal position (run).
The steepest decent between a cell and its neighbors is known as the gradient.
Slope is often reported in degrees (0° is flat, 90° is vertical) but is also expressed as a percent
Slope
Hands-on Exercise: Mapping Census Data
Database manipulation (table joins) Reprojecting maps Calculating derived values (population density, change
population over time) Visualization
ArcGIS Main Components
ArcMap
ArcCatalog
ArcToolbox
Data Quality
It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable
Uncertainty is found in data and in its processing and analysis
The outputs from spatial data analysis and GIS are only as good as the inputs and associated assumptions.
Logical Consistency
Representation of data that does not make sense Road in the water Contours that cross or end Features on steep slopes
Modifiable areal unit problem
Multiple ways to aggregate data into zones and thereby yielding different results.
Anscombe’s Quartet
These four data sets look identical from a statistical perspective.
Anscombe’s Quartet
They don’t look anything alike from a graphical perspective!!