Stefan Falke [email protected] Stefan Falke [email protected] An Overview of Spatial Data...

69
Stefan Falke [email protected] An Overview of Spatial Data Analysis http://capita.wustl.edu/ENVE424/REU/ SpatialAnalysis.htm

Transcript of Stefan Falke [email protected] Stefan Falke [email protected] An Overview of Spatial Data...

Page 1: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Stefan Falke

[email protected]

Stefan Falke

[email protected]

An Overview of Spatial Data AnalysisAn Overview of Spatial Data Analysis

http://capita.wustl.edu/ENVE424/REU/SpatialAnalysis.htm

Page 2: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Pop vs Soda vs Coke

http://www.popvssoda.com/

Page 3: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Pop vs Soda vs Coke by County

Page 4: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

2000 Presidential Election Results

Bush

Gore

votes: 50,456,169

votes: 50,996,116

States: 30

States: 21

Page 5: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

2000 Presidential Election Results by County

Bush

Gore

Page 6: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Environmental Pattern and Trend Analysis

When analyzing environmental data we examine:

We are particularly interested in changes in these patterns and trends and relationships with other patterns and trends

The analysis also strives to determine why we see these patterns and trends – what are the casual factors and what are their impacts.

Temporal TrendsSpatial Patterns

Page 7: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Spatial and Temporal Data Analysis

Turns raw data into useful information by adding greater informative content and value

Wisdom

Knowledge /Evidence

Information

Data

Page 8: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

What is Spatial Data Analysis?

Spatial analysis is the quantitative and qualitative study of phenomena that are located in space.

Environmental spatial data analysis describes characteristics and behavior of the environment

Explores patterns, trends, and relationships in environmental data Seeks to explain these patterns, trends, and relationships

Differs from general data analysis and statistics in that: Spatial data are dependent on location and related by location (they do

not adhere to the independence assumption made in regular data analysis)

Have properties that require special analysis methods

about 85% of environmental data is spatial

Why is spatial analysis such a big deal?

Page 9: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

What is GIS?

Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data.

‘GIS’ is Geographical Information System OR IS IT

Geographical Information Science?

GISystems: Emphasis on technology and toolsGIScience: Fundamental issues raised by the use of GIS, such as

Spatial analysisMap projectionsAccuracyScientific visualization

Implementation and application of GIS covers a wide spectrum:

Simple mapsOverlaying multiple map “layers”Conducting proximity or cluster analysis based on distanceComparing data sets (simple spatial statistics)Complex statistical analysis

Page 10: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

NatureVol 42722 January 2004

Page 11: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Special Spatial Nomenclature

Geographic – Limited to phenomena and problems relating to Earth’s surface and near-surface

Spatial – Any space, including geographic, but not restricted to geographic coordinate space, e.g. medical imaging

Geospatial – A recent term to represent the subset of spatial applied specifically to the Earth’s surface. (synonymous with geographic)

Page 12: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

http://labs.google.com/location

Page 13: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Tobler’s First Law of Geography

“Everything is related to everything else, but near things are more related than distant things.”

Tobler, 1970

This general assumption is what subjects spatial data subject to special statistical laws

Page 14: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Types of Spatial Analysis

There are literally thousands of techniques Bailey and Gatrell, 1995 offer four spatial data analysis classes:

Point Data Analysis• Do the locations of point data and the relationship among the points represent a

‘significant’ pattern

Continuous Data Analysis• What are the spatial pattern and characteristics over a region given a set of samples

Area Data Analysis• Analysis of data that have been aggregated over a spatial zone, e.g. county

Page 15: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

The John Snow Map

A classic example of the use of location to draw inferences

1854 cholera outbreak in London

Point data map indicated some spatial clustering

Overlaying a map of water pump locations showed many cases were concentrated around a single pump

Page 16: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Continuous Data Analysis

Temperature data is well suited for converting from point to continuous data

- It has high spatial density

- Ambient temperature is relatively spatially homogenous (no sharp gradients)

Page 17: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

County Level Aggregated Data

Also known as a chloropleth plot

Page 18: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Scale

The most appropriate analysis method to use depends on the spatial and temporal scales of the problem.

The spatial variability of temperature at a ‘local’ scale is not necessarily significant when conducting an analysis over at the ‘regional’ or ‘global’ scale.

Page 19: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Scale Dependent Measurements

How long is Maine’s coastline?

length=340 km

length=355 km

length=415 km

From Longley et al., 2001

Page 20: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

What’s in a map, anyway?

Theme: Static map Maps of entities whose location is known and constant (relatively) Roads, borders, locations of buildings These types of layers are often referred to as “thematic” layers Are usually used to provide context to other spatial data

Statistical: Realization of one of the many possible patterns that may have been generated by a process Given a set of conditions, a given spatial pattern is just one

instance among a distribution of possible patterns The question is: Is the observed realization significantly different

than what would be expected by chance?

Page 21: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Deterministic versus Stochastic Processes

Deterministic processes have one realization: the value at a given location is always the same, regardless of the number of times the process is occurs

Stochastic processes have multiple realizations that are not precisely predicted and involve a random component.

For our purposes, random refers to the method used to generate a pattern not the resulting pattern itself.

Page 22: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Examples of Deterministic & Stochastic Processes

yxz 32 Deterministic

dyxz 32

Stochastic

random variable

1d

Page 23: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Random Spatial Processes

A random process does not mean that all events are independent of one another, as is the case with flipping a coin or rolling dice.

Rather, spatial random processes are random with dependence (or rules).

Consider a “conditionally” random display of 4 coins:

Flip the first 3 coins and display by their flipped side (head or tails)

The 4th coin will not be flipped

The 4th coin is displayed as follows:If the 2nd and 3rd flipped coins are heads, the 4th is the same as the firstOtherwise, the 4th is opposite of the first.

Page 24: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Basic Statistical Concepts

Frequency/Probability Distributions

0

1

2

3

4

5

6

1 2 3 4 5 6 7

Bin

Freq

uenc

y

n

a

a

n

ii

1Mean: Variance: 2

1

2

1

1

n

ii aa

ns

0

1

2

3

4

5

6

7

8

9

1 2 3 4 5 6 7 8 9 10 11

Bin

Fre

qu

en

cy

Covariance:

Normal or Gaussian

Poisson

n

iii yyxx

nyxCov

1

1),(

mean=variancemean=median

Median: The value in the distribution at which 50% of the data points lie both above and below

Page 25: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Distribution Summary Statistics

Measures of Location

Measures of Spread

Measures of Shape

The features of a distribution can be summarized using:

• Mean• Median• Quantiles

• Standard Deviation = Square Root of Variance

• Coefficient of skewness – a measure of symmetry• Kurtosis – a measure of the likelihood of outliers

Page 26: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Complete Spatial Randomness

Take as an example a randomly generated point data set where

1) the chance of a given x,y point existing is equal to the chance any other point existing (uniform probability distribution)

2) the existence of a x,y point is independent of the existence of any other point

These two conditions constitute an independent random process (IRP) or complete spatial randomness (CSR)

Page 27: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Exploratory Spatial Data Analysis (ESDA)

Aim is to identify data properties for purposes of pattern detection

Based on the use of graphical and visual methods and the use of numerical techniques that are statistically robust i.e. not much affected by extreme or atypical data values.

ArcGIS Geostatistical Analyst extensioncontains a set of ESDA tools:

• Histogram (Frequency Distribution)• Voronoi Map• QQPlot• Trend Analysis

Page 28: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Exploratory Analysis Example

Page 29: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Summary Statistics

Page 30: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Quantile Plots

Graphs the quantiles of a dataset against the quantiles of a normal distribution

Page 31: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Vornoi Plot

Voronoi plots assign or calculate values to a point’s polygon.

Including:•value itself•mean of neighboring polygons•most frequent value among neighboring polygons•unique value among neighbors•variation among neighbors

Page 32: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Spatial Smoothing/Averaging

Page 33: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Data Types

Two general views to organizing spatial data: Entities or objects

• Point measurements, rivers, structures• Have attributes or features attached to them• Point, vector or area format• Values exist at discrete locations

Fields• Continuous data such as temperature gradient fields and satellite

imagery• Values exist over an area• Raster format (grids)

Page 34: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Data Types

Entities and fields can be transformed to the other type

Page 35: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Vector RepresentationX-AXIS

500

400

300

200

100

600500400300200100

Y-AXIS

River

House

600

Trees

Trees

BB

B BB

BBB G

GBK

BBB

G

G

G GG

Raster Representation

1 2 3 4 5 6 7 8 9 1012345

67

8910

Real World

G G

Raster and Vector Data Models

adapted from Lembo, 2003

Page 36: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Landcover Raster Grid

Legend

Mixed coniferDouglas fir

Oak savannahGrassland (1-5)

(6-10)

(11-15)

(16-20)

2 17

17

1616

151411

13 15

15 15

13

13

12

12

16

10

10

8

8

87

7

65

5

5

5

5

5

4

4

3

3

4

Page 37: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

What is GIS?

Traditional definition is that GIS is a set of computer tools for accessing, processing, visualizing, analyzing, interpreting, and presenting spatial data.

‘GIS’ is Geographical Information System OR IS IT

Geographical Information Science?

GISystems: Emphasis on technology and toolsGIScience: Fundamental issues raised by the use of GIS, such as

Spatial analysisMap projectionsAccuracyScientific visualization

Implementation and application of GIS covers a wide spectrum:

Simple mapsOverlaying multiple map “layers”Conducting proximity or cluster analysis based on distanceComparing data sets (simple spatial statistics)Complex statistical analysis

Page 38: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

GIS Functionality

Filtering Retrieves a subset of a dataset Examples

• Query (search)

Aggregation Combines attributes or features within data sources (layers) Examples

• Reclassify, dissolve

Integration Combine two or more data sources (layers) Example

• Polygon overlay, table joining

Page 39: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Identifying features based on spatial criteriaCriteria include variations on:

adjacency, containment, arrangement, and connectivity

Spatial Queries (Filter)

Containment

Adjacency

Which states “contain” the Mississippi River and its tributaries?

Which states are adjacent to the State of Missouri?

Page 40: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Reclassification (Aggregation)

An assignment of a class or value based on the attributes or geography of an object

Page 41: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Reclassification & Dissolve

Page 42: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Variable Distance Buffering

Page 43: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Polygon Overlay (Integration)

Topology describes the relationships between elements of a map.

A topological data structure defines the elements of the map in a way that makes it possible to know which line segments are connected to each other and to know what polygon is adjacent to each side of a line segment.

Page 44: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

“Cookie-cutter” method

Polygon Overlay Examples

Page 45: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Coordinate Systems

A geographical coordinate system uses a three-dimensional spherical surface to define locations on the earth.

Divides space into orderly structure of locations.

Two types: Cartesian and angular (spherical)

© Paul Bolstad, GIS Fundamentals

Page 46: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Parallels and MeridiansMeridians are great circles of constant longitude

Example is the prime meridian

latitude (φ): angular distance from equator

longitude (λ): angular distance from standard meridian

St. Louis 38° 39' N 90° 38' W New York 40° 47' N 73° 58' W Los Angeles 34° 3' N 118° 14' W Rome 41° 48' N  12° 36' E Sydney 33° 52' S  151° 12' E

Parallels are circles of constant latitude

Example is the equator

Page 47: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Earth’s Expanding Waistline

From the Chronicle of Higher Education Jan 17, 2003

Page 48: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

DatumWhile a spheroid approximates the shape of the earth, a datum defines the position of the ellipsoid relative to the center of the Earth

The datum provides a frame of reference for measuring locations on the surface of the Earth

A datum is chosen to align a spheroid to closely fit the Earth’s surface in a particular area

Page 49: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Map Projections and Distortions

Equal area – the ratio of areas on the earth and on the map are constant. Shape, angle, and scale are distorted.

Conformal – the shape of any small surface of the map is preserved in its original form. If meridians and parallel lines are at 90-degree angles, then angles are also preserved.

Equidistant - preserve distances between certain points. Scale is not maintained correctly, however, typically one or more lines has its scale maintained.

Three general types of projections:

Page 50: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Comparing Projections

Page 51: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Summary Statistics of a Point Pattern

n

y

n

xs

n

ii

n

ii

yx11 ,),(

X

Mean center average of the x and y coordinates (geographic mean)

Standard Distance average distance of points from center (provides measure of dispersion)

Summary CircleCentered at mean center with a radius of the standard distance

n

yxd

n

iyixi

1

22 X

X

Page 52: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

US Population Density

Page 53: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Geographic Center of US Population

n

ii

n

iii

w

ywy

1

1

n

iii

n

iiii

yw

yxwx

1

1

cos

cos

population :

latitude :

longitude :

w

y

x

The center of the US population is calculated as the average latitude and longitudes weighted by the population at a uniformly spaced set of points

Page 54: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Quadrant CountA quadrant count is conducted by superimposing a regular grid over data, counting the number of events in each grid cell and divide the count by its cell area to get intensity.

Variance:

2

1

2 1

n

i

kn

s

40 grid cells

Mean cell count

175.140

47

1444.240

775.85

825.1175.1

1444.2

mean

variance

A s2 to µ ratio greater than 1 indicates clustering

Page 55: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Positive Spatial Autocorrelation

Like values tend to cluster in space

Negative Spatial Autocorrelation

Neighbors are dissimilar

Zero Spatial Autocorrelation

No correlation

Spatial Autocorrelation

Defines the correlation between values of the same variable at different spatial locations

Page 56: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

point monitoring

data

continuous surface of

estimates (map)

spatial estimation method

ci is the estimated value at location i

n is the number of data pointscj is the value at data point j

wij is the weight assigned to data point j

The factor that determines how much influence a data point is assigned during the calculation of the estimate

From points to fields

The weighting factor is usually the distinguishing feature of interpolation methods.

Biggest challenge: How to determine the weights?

Page 57: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

j

kij

kij

ijd

dw

k is the power-law of distance weighting

Inverse Distance Interpolation

Constrained to the minimum and maximum values in point data set

Page 58: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Spatial Smoothing/Averaging

Page 59: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Landcover Raster Grid

Legend

Mixed coniferDouglas fir

Oak savannahGrassland (1-5)

(6-10)

(11-15)

(16-20)

2 17

17

1616

151411

13 15

15 15

13

13

12

12

16

10

10

8

8

87

7

65

5

5

5

5

5

4

4

3

3

4

Page 60: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Raster Analysis (Continuous Data)

2 3 5

2 3 6

3 5 7

2

minimum

7

5 4

maximum

range mean

Moving Windows

Page 61: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Slope is the change is elevation (rise) with a change in horizontal position (run).

The steepest decent between a cell and its neighbors is known as the gradient.

Slope is often reported in degrees (0° is flat, 90° is vertical) but is also expressed as a percent

Slope

Page 62: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Hands-on Exercise: Mapping Census Data

Database manipulation (table joins) Reprojecting maps Calculating derived values (population density, change

population over time) Visualization

Page 63: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

ArcGIS Main Components

ArcMap

ArcCatalog

ArcToolbox

Page 64: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .
Page 65: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Data Quality

It is impossible to make a perfect representation of the world, so uncertainty about it is inevitable

Uncertainty is found in data and in its processing and analysis

The outputs from spatial data analysis and GIS are only as good as the inputs and associated assumptions.

Page 66: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Logical Consistency

Representation of data that does not make sense Road in the water Contours that cross or end Features on steep slopes

Page 67: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Modifiable areal unit problem

Multiple ways to aggregate data into zones and thereby yielding different results.

Page 68: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Anscombe’s Quartet

These four data sets look identical from a statistical perspective.

Page 69: Stefan Falke stefan@me.wustl.edu Stefan Falke stefan@me.wustl.edu An Overview of Spatial Data Analysis .

Anscombe’s Quartet

They don’t look anything alike from a graphical perspective!!