Spatial Association and spatial statistic techniques

Spatial Association and spatial statistic techniques

Danlin Yu

Ph.D. Candidate

Dept. of Geography, UWM

Detecting Spatial Association

What is spatial associationSpatial objects tend to relate with one another

Types of spatial associationSpatial autocorrelation: similar (dissimilar) values in space tend to cluster togetherSpatial heterogeneity: spatial regimes, space is not homogeneousAutocorrelation and heterogeneity are closely related

Detecting spatial association

Why study spatial association It is inherent in geographic researchesWhen working on spatial data, analyses based on regular statistics are VERY likely to be misleading or incorrect

How to detect spatial associationPower of GISExploratory Spatial Data Analysis (ESDA): let the data speak

Background

The first law of Geography:Everything is related, but things nearby are more related than things far away

Characteristics of spatial statisticsExistence of spatial association violates an important statistical assumption: independenceSpatial patterns are results of spatial processes – the one we see, is one of numerous possibilities from the same spatial process

Types of spatial association

Point spatial associationDistance is critical in deciding point spatial association

Line spatial associationDistance and path

Areal spatial associationDistance and contiguity

Today’s topic: univariate SA

Univariate: for pattern detectionExamples: per capita GDP for economic performance pattern; surface temperature for local climate pattern, etc.Central question: is the pattern we see a result of some specific processes (usually random or normal processes – our null hypothesis)?

Multivariate: spatial regression or geographically weighted regression (GWR)

Researching means

Hypothesis testing in answering this question is conducted via spatial statistic meansFor univariate geographic data, there are a few indexes in literature:

Moran’s Index (Moran’s I)Geary’s Index (Geary’s c)Getis’s G or O

Spatial statistic indexes

Purposes of the three indexes are very similar – based on the geographic data, calculate an index, test the index against the null

The most often encountered index is the Moran’s I

Discussion on Moran’s I are applicable to other indexes subject to minor adjustments

Moran’s Index (I)

Structured like the Pearson’s product-moment statistic: measure of covariance

n

ii

n

i

n

jjiij

n

i

n

jij yy

yyyyw

w

nI

2)(

))((

Moran’s I

wij is the weight, wij=1 if locations i and j are adjacent and zero otherwise (wii=0, a region is not adjacent to itself).

yi and are the variable in the ith location and the mean of the variable, respectivelyn is the total number of observationsI is used to test hypotheses concerning similarity

y

Determining the weights

Two rulesDistance: locations within a certain distance are considered as neighbors

Border-sharing (for areal units only): areas sharing borders are considered as neighbors

Weights matrix: could be symmetric or asymmetric – binary weights matrix, general weights matrix (distance decaying)


Spatial weights matrix should be constructed judiciously

Ideally, related to general concepts from spatial interaction theory, such as the notions of accessibility and potential etc.


When used in hypothesis testing, this requirement is less stringent

Since our purpose is to test the null – spatial independence

Still, trying a few structures is a good idea – border sharing, different distances


A typical symmetric weights matrix is a binary weights matrix where neighbors are coded as 1, others 0

Without losing generality, it is usually row standardized – all elements of one row add up to 1

Hypothesis testing

The expected values and the variance for Moran’s I are used for testingHowever, it is observed that in the null hypothesis, Moran’s I usually does not follow normal distributionAlternatives

Random permutationSaddlepoint approximation

Hypothesis testing

Monte Carlo (random) permutation for Moran’s I

Randomly arrange the values among the space and calculate I each time (e.g., 999 times)Comparing the actual I with the 999 randomly gained IsIf the actual I falls into area of either more than 95% or less than 5%, it is said the I is psuedo significant at 5% level (positive/negative)

Hypothesis testing

Saddlepoint approximation (Tiefolsdorf, 2001)

Exact distribution of Moran’s I can be obtained, but computationally prohibitive for even medium size data setA saddlepoint distribution approximates the exact distribution with reasonable accuracyBased on the ratio of quadratic normal variablesUsually, random permutation would do the job

Global and local (1)

The Moran’s I just introduced are based on simultaneous measurements from many locations – hence, it is a GLOBAL statistics

Global statistics provides only a limited set of spatial association measurements

You see the pattern, details are ignored – tree and forest dilemma


Recently, a number of statistics have been developed to measure dependence in portion of the study area – the local statistics

In spatial data analysis, the name is Local Index of Spatial Association (LISA) by Anselin (1995)


Definition of LISA (Anselin, 1995)The local statistics for each observation gives an indication of the extent of significant spatial clustering of similar values around that observation

The sum of local statistics for all observation is proportional (or equal) to a corresponding global statistics


Local statistics are well suited toIdentify existence of pockets or “hot spots”

Assess assumptions of stationarity

Identify distances beyond which no discernible association obtains

Global and local statistics are often used together for thorough understanding of spatial association and processes


This discussion is based on the decomposition of the Moran’s I to its local versionOthers can be done similarly, however, there is an important aspects of Moran’s I that will assist further understanding in spatial analysis

It can be decomposed into its local version, AND a graphic version – Moran’s scatterplot

Local Moran’s I

Following Anselin’s (1995) definition, a local Moran’s Ii may be defined as:

zis are the deviations from the mean of yis

The weights are row standardized

n

jjijii zwzI

Local Moran’s I

Hypothesis test for local Moran’s I is more complex

The distribution of local Moran’s I is definitely not normal, furthermore, local Moran’s I’s distribution is influenced by the global patternRandom permutation won’t work – for one specific location, during the permutation, the local Moran’s I’s mean and variance keep changing – which is not the case for global one

Local Moran’s I

Exact distribution of local Moran’s I can be obtained, but extremely computationally prohibitive

Saddlepoint approximation currently is thus far one potential resolution

Details can be found at Tiefelsdorf (2000; 2002)

Local Moran’s I

In addition, local Moran’s Is correlate with one another due to overlapping neighbors

Bonferroni correction or other correction methods are needed for acquiring robust testing results

These are all done in the SPDEP package in R

Moran’s scatterplot

A graphic tool for detecting local spatial association

Derived directly from the global Moran’s I

It can be used together with the local Moran’s I for better understanding


Recall the formula of Moran’s I:

If use row standardized weights matrix the first term will be 1

n

ii

n

i

n

jjiij

n

i

n

jij yy

yyyyw

w

nI

2)(

))((


Therefore, I could be re-written as:

Or:

n

ii

n

i

n

jjiij

yy

yyyyw

I2)(

))((

n

ii

n

i

n

jjiji

yy

yywyy

I2)(

))()((


Recall the coefficient of the linear regression, b:

indi and depi are the independent and dependent variables; the “bar” versions are their means, respectively; and b is the regression coefficient

n

ii

n

iii

indind

depdepindindb

2)(

))((


Yes, similarity between the Moran’s I and the regression coefficient b

Actually, is the so-called

“spatial lag” of location i.

So, I is formally equivalent to a regression coefficient in a regression of a location’s spatial lag on itself

n

jjij yyw ))((

Moran’s Scatterplot

This interpretation enables us to visualize Moran’s I in a scatterplot of a location’s spatial lag and itself – the Moran’s scatterplot

Moran’s I is the slope of the regression line

A lack of fit (in the scatterplot) would indicate important local spatial process and associations (local pockets/non-stationarity)


The scatterplot is centered on the coordinate Origin

The first and third quadrants of the plot represent positive association (high-high and low-low), while the second and fourth negative (high-low, low-high)

The density of the quadrants represent the dominating local spatial process


A so-called LOWESS (LOcally Weighted rEgression Scatterplot Smoothing) curve can aid the visual effects

Turning of the LOWESS curve usually indicates interesting local pockets, regimes or non-stationarity

An example: demonstration in R

More about Moran’s Scatterplot

A very important ESDA tools for spatial data analysis

Further information could be obtained from: The Moran Scatterplot as an ESDA tool to assess local instability in spatial association. pp. 111–125 in M. M. Fischer, H. J. Scholten and D. Unwin (eds.) Spatial Spatial analytical perspectives on GISanalytical perspectives on GIS, London: Taylor and Francis

An analytical example

Spatial pattern detection in China’s provincial development

The variable used: per capita GDP

Dynamic patterns – global Moran’s I

Specific local spatial process – local Moran’s I and the Moran’s scatterplot

EasternRegion

CentralRegion

WesternRegion

0 1,000 2,000500 Kilometers

0 500 1,000250 Miles

Yuan

175 - 291

292 - 430

431 - 680

681 - 1290

1291 - 2498

China: per capita GDP in 1978

EasternRegion

CentralRegion

WesternRegion

0 1,000 2,000500 Kilometers

0 500 1,000250 Miles

Yuan

869 - 1913

1914 - 3162

3163 - 4532

4533 - 8411

8412 - 15593

China: per capita GDP in 2000


0

0.05

0.1

0.15

0.2

0.25

19781979198019811982198319841985198619871988198919901991199219931994199519961997199819992000

Year

Glo

bal M

oran

's I

Dynamic change of global Moran’s I from 1978 to 2000, all are significant at 5% level per random permutation


There is a clustering trend in China’s provincial level development (represented by per capita GDP

But the global Moran’s I can’t tell on which side does the clustering trend take place: high values cluster or low values cluster?

GDP per capita (standardized)

543210-1

Spa

tial l

ag o

f GD

P p

er c

apita

(st

anda

rdiz

ed) 3.0

2.0

1.0

0.0

-1.0

XJ

NXQHGSSSXXZ

YNGZ

SC

HaN

GX GDHuNHuB

HeNSDJXFJAH

ZJ

JS

SHHLJ

JL

LN

NMG

SX

HeB

TJ

BJ

The Moran’s scatterplot in 1978

GDP per capita (standardized)

543210-1

Spa

tial l

ag o

f GD

P p

er c

apita

(st

anda

rdiz

ed) 3

2

1

0

-1

-2

XJNXQHGSSSXXZ

YNGZSC

HaN

GX GDHuN

HuB

HeN

SDJX

FJ

AH

ZJ

JS

SH

HLJ

JL

LNNMGSX

HeB

TJ

BJ

The Moran’s scatterplot in 2000

EasternRegion

CentralRegion

WesternRegion

0 1,000 2,000500 Kilometers

0 500 1,000250 Miles

Local Moran's I

< - 0.3

- 0.3 - 0

0 - 0.3

0.3 - 1.0

> 1.0

Local Moran’s I in 1978

Local Moran’s I in 2000

EasternRegion

CentralRegion

WesternRegion

0 1,000 2,000500 Kilometers

0 500 1,000250 Miles

Local Moran's I

- 0.3 - 0

0 - 0.3

0.3 - 1.0

> 1.0


First, China’s coast-interior divide persistedInterior provinces exhibit great geographical similarity in economic development and spatial contributions to the global Moran’s I

Second, the municipalities (Beijing, Tianjin, Shanghai) always contribute the most

Shanghai’s position is worth noting, it development changed the spatial pattern the most


Third, Guangdong’s contribution to the global index corresponds with its changing spatial behavior depicted in the Moran scatterplot

Fourth, while most of the interior provinces have similar patterns, coastal provinces vary greatly


Fifth, Shandong fell into the low-low quadrant, and contributed very little to the global index

Sixth, Guizhou and Yunnan, two provinces in southwest China, contributed relatively highly to the global index in 2000

The poorest ones tend to form a poor cluster

Demo – with R and SPDEP

A little demonstration

The software package RR: freeware, : freeware, powerful, open sourcepowerful, open source

Packages: SPDEPSPDEP and MAPTOOLSMAPTOOLS

If you have spatial data and interested in utilizing ESDA, you can approach me for your research

Spatial Association and spatial statistic techniques

Documents

Transcript of Spatial Association and spatial statistic techniques