Interactive Techniques and Exploratory Spatial Data Analysis
Spatial Association and spatial statistic techniques
description
Transcript of Spatial Association and spatial statistic techniques
Spatial Association and spatial statistic techniques
Danlin Yu
Ph.D. Candidate
Dept. of Geography, UWM
Detecting Spatial Association
What is spatial associationSpatial objects tend to relate with one another
Types of spatial associationSpatial autocorrelation: similar (dissimilar) values in space tend to cluster togetherSpatial heterogeneity: spatial regimes, space is not homogeneousAutocorrelation and heterogeneity are closely related
Detecting spatial association
Why study spatial association It is inherent in geographic researchesWhen working on spatial data, analyses based on regular statistics are VERY likely to be misleading or incorrect
How to detect spatial associationPower of GISExploratory Spatial Data Analysis (ESDA): let the data speak
Background
The first law of Geography:Everything is related, but things nearby are more related than things far away
Characteristics of spatial statisticsExistence of spatial association violates an important statistical assumption: independenceSpatial patterns are results of spatial processes – the one we see, is one of numerous possibilities from the same spatial process
Types of spatial association
Point spatial associationDistance is critical in deciding point spatial association
Line spatial associationDistance and path
Areal spatial associationDistance and contiguity
Today’s topic: univariate SA
Univariate: for pattern detectionExamples: per capita GDP for economic performance pattern; surface temperature for local climate pattern, etc.Central question: is the pattern we see a result of some specific processes (usually random or normal processes – our null hypothesis)?
Multivariate: spatial regression or geographically weighted regression (GWR)
Researching means
Hypothesis testing in answering this question is conducted via spatial statistic meansFor univariate geographic data, there are a few indexes in literature:
Moran’s Index (Moran’s I)Geary’s Index (Geary’s c)Getis’s G or O
Spatial statistic indexes
Purposes of the three indexes are very similar – based on the geographic data, calculate an index, test the index against the null
The most often encountered index is the Moran’s I
Discussion on Moran’s I are applicable to other indexes subject to minor adjustments
Moran’s Index (I)
Structured like the Pearson’s product-moment statistic: measure of covariance
n
ii
n
i
n
jjiij
n
i
n
jij yy
yyyyw
w
nI
2)(
))((
Moran’s I
wij is the weight, wij=1 if locations i and j are adjacent and zero otherwise (wii=0, a region is not adjacent to itself).
yi and are the variable in the ith location and the mean of the variable, respectivelyn is the total number of observationsI is used to test hypotheses concerning similarity
y
Determining the weights
Two rulesDistance: locations within a certain distance are considered as neighbors
Border-sharing (for areal units only): areas sharing borders are considered as neighbors
Weights matrix: could be symmetric or asymmetric – binary weights matrix, general weights matrix (distance decaying)
Determining the weights
Spatial weights matrix should be constructed judiciously
Ideally, related to general concepts from spatial interaction theory, such as the notions of accessibility and potential etc.
Determining the weights
When used in hypothesis testing, this requirement is less stringent
Since our purpose is to test the null – spatial independence
Still, trying a few structures is a good idea – border sharing, different distances
Determining the weights
A typical symmetric weights matrix is a binary weights matrix where neighbors are coded as 1, others 0
Without losing generality, it is usually row standardized – all elements of one row add up to 1
Hypothesis testing
The expected values and the variance for Moran’s I are used for testingHowever, it is observed that in the null hypothesis, Moran’s I usually does not follow normal distributionAlternatives
Random permutationSaddlepoint approximation
Hypothesis testing
Monte Carlo (random) permutation for Moran’s I
Randomly arrange the values among the space and calculate I each time (e.g., 999 times)Comparing the actual I with the 999 randomly gained IsIf the actual I falls into area of either more than 95% or less than 5%, it is said the I is psuedo significant at 5% level (positive/negative)
Hypothesis testing
Saddlepoint approximation (Tiefolsdorf, 2001)
Exact distribution of Moran’s I can be obtained, but computationally prohibitive for even medium size data setA saddlepoint distribution approximates the exact distribution with reasonable accuracyBased on the ratio of quadratic normal variablesUsually, random permutation would do the job
Global and local (1)
The Moran’s I just introduced are based on simultaneous measurements from many locations – hence, it is a GLOBAL statistics
Global statistics provides only a limited set of spatial association measurements
You see the pattern, details are ignored – tree and forest dilemma
Global and local (2)
Recently, a number of statistics have been developed to measure dependence in portion of the study area – the local statistics
In spatial data analysis, the name is Local Index of Spatial Association (LISA) by Anselin (1995)
Global and local (3)
Definition of LISA (Anselin, 1995)The local statistics for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation
The sum of local statistics for all observation is proportional (or equal) to a corresponding global statistics
Global and local (4)
Local statistics are well suited toIdentify existence of pockets or “hot spots”
Assess assumptions of stationarity
Identify distances beyond which no discernible association obtains
Global and local statistics are often used together for thorough understanding of spatial association and processes
Global and local (5)
This discussion is based on the decomposition of the Moran’s I to its local versionOthers can be done similarly, however, there is an important aspects of Moran’s I that will assist further understanding in spatial analysis
It can be decomposed into its local version, AND a graphic version – Moran’s scatterplot
Local Moran’s I
Following Anselin’s (1995) definition, a local Moran’s Ii may be defined as:
zis are the deviations from the mean of yis
The weights are row standardized
n
jjijii zwzI
Local Moran’s I
Hypothesis test for local Moran’s I is more complex
The distribution of local Moran’s I is definitely not normal, furthermore, local Moran’s I’s distribution is influenced by the global patternRandom permutation won’t work – for one specific location, during the permutation, the local Moran’s I’s mean and variance keep changing – which is not the case for global one
Local Moran’s I
Exact distribution of local Moran’s I can be obtained, but extremely computationally prohibitive
Saddlepoint approximation currently is thus far one potential resolution
Details can be found at Tiefelsdorf (2000; 2002)
Local Moran’s I
In addition, local Moran’s Is correlate with one another due to overlapping neighbors
Bonferroni correction or other correction methods are needed for acquiring robust testing results
These are all done in the SPDEP package in R
Moran’s scatterplot
A graphic tool for detecting local spatial association
Derived directly from the global Moran’s I
It can be used together with the local Moran’s I for better understanding
Moran’s scatterplot
Recall the formula of Moran’s I:
If use row standardized weights matrix the first term will be 1
n
ii
n
i
n
jjiij
n
i
n
jij yy
yyyyw
w
nI
2)(
))((
Moran’s scatterplot
Therefore, I could be re-written as:
Or:
n
ii
n
i
n
jjiij
yy
yyyyw
I2)(
))((
n
ii
n
i
n
jjiji
yy
yywyy
I2)(
))()((
Moran’s scatterplot
Recall the coefficient of the linear regression, b:
indi and depi are the independent and dependent variables; the “bar” versions are their means, respectively; and b is the regression coefficient
n
ii
n
iii
indind
depdepindindb
2)(
))((
Moran’s scatterplot
Yes, similarity between the Moran’s I and the regression coefficient b
Actually, is the so-called
“spatial lag” of location i.
So, I is formally equivalent to a regression coefficient in a regression of a location’s spatial lag on itself
n
jjij yyw ))((
Moran’s Scatterplot
This interpretation enables us to visualize Moran’s I in a scatterplot of a location’s spatial lag and itself – the Moran’s scatterplot
Moran’s I is the slope of the regression line
A lack of fit (in the scatterplot) would indicate important local spatial process and associations (local pockets/non-stationarity)
Moran’s scatterplot
The scatterplot is centered on the coordinate Origin
The first and third quadrants of the plot represent positive association (high-high and low-low), while the second and fourth negative (high-low, low-high)
The density of the quadrants represent the dominating local spatial process
Moran’s scatterplot
A so-called LOWESS (LOcally Weighted rEgression Scatterplot Smoothing) curve can aid the visual effects
Turning of the LOWESS curve usually indicates interesting local pockets, regimes or non-stationarity
An example: demonstration in R
More about Moran’s Scatterplot
A very important ESDA tools for spatial data analysis
Further information could be obtained from: The Moran Scatterplot as an ESDA tool to assess local instability in spatial association. pp. 111–125 in M. M. Fischer, H. J. Scholten and D. Unwin (eds.) Spatial Spatial analytical perspectives on GISanalytical perspectives on GIS, London: Taylor and Francis
An analytical example
Spatial pattern detection in China’s provincial development
The variable used: per capita GDP
Dynamic patterns – global Moran’s I
Specific local spatial process – local Moran’s I and the Moran’s scatterplot
EasternRegion
CentralRegion
WesternRegion
0 1,000 2,000500 Kilometers
0 500 1,000250 Miles
Yuan
175 - 291
292 - 430
431 - 680
681 - 1290
1291 - 2498
China: per capita GDP in 1978
EasternRegion
CentralRegion
WesternRegion
0 1,000 2,000500 Kilometers
0 500 1,000250 Miles
Yuan
869 - 1913
1914 - 3162
3163 - 4532
4533 - 8411
8412 - 15593
China: per capita GDP in 2000
An analytical example
0
0.05
0.1
0.15
0.2
0.25
19781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000
Year
Glo
bal M
oran
's I
Dynamic change of global Moran’s I from 1978 to 2000, all are significant at 5% level per random permutation
An analytical example
There is a clustering trend in China’s provincial level development (represented by per capita GDP
But the global Moran’s I can’t tell on which side does the clustering trend take place: high values cluster or low values cluster?
GDP per capita (standardized)
543210-1
Spa
tial l
ag o
f GD
P p
er c
apita
(st
anda
rdiz
ed) 3.0
2.0
1.0
0.0
-1.0
XJ
NXQHGSSSXXZ
YNGZ
SC
HaN
GX GDHuNHuB
HeNSDJXFJAH
ZJ
JS
SHHLJ
JL
LN
NMG
SX
HeB
TJ
BJ
The Moran’s scatterplot in 1978
GDP per capita (standardized)
543210-1
Spa
tial l
ag o
f GD
P p
er c
apita
(st
anda
rdiz
ed) 3
2
1
0
-1
-2
XJNXQHGSSSXXZ
YNGZSC
HaN
GX GDHuN
HuB
HeN
SDJX
FJ
AH
ZJ
JS
SH
HLJ
JL
LNNMGSX
HeB
TJ
BJ
The Moran’s scatterplot in 2000
EasternRegion
CentralRegion
WesternRegion
0 1,000 2,000500 Kilometers
0 500 1,000250 Miles
Local Moran's I
< - 0.3
- 0.3 - 0
0 - 0.3
0.3 - 1.0
> 1.0
Local Moran’s I in 1978
Local Moran’s I in 2000
EasternRegion
CentralRegion
WesternRegion
0 1,000 2,000500 Kilometers
0 500 1,000250 Miles
Local Moran's I
- 0.3 - 0
0 - 0.3
0.3 - 1.0
> 1.0
An analytical example
First, China’s coast-interior divide persistedInterior provinces exhibit great geographical similarity in economic development and spatial contributions to the global Moran’s I
Second, the municipalities (Beijing, Tianjin, Shanghai) always contribute the most
Shanghai’s position is worth noting, it development changed the spatial pattern the most
An analytical example
Third, Guangdong’s contribution to the global index corresponds with its changing spatial behavior depicted in the Moran scatterplot
Fourth, while most of the interior provinces have similar patterns, coastal provinces vary greatly
An analytical example
Fifth, Shandong fell into the low-low quadrant, and contributed very little to the global index
Sixth, Guizhou and Yunnan, two provinces in southwest China, contributed relatively highly to the global index in 2000
The poorest ones tend to form a poor cluster
Demo – with R and SPDEP
A little demonstration
The software package RR: freeware, : freeware, powerful, open sourcepowerful, open source
Packages: SPDEPSPDEP and MAPTOOLSMAPTOOLS
If you have spatial data and interested in utilizing ESDA, you can approach me for your research