Of CUNY, By CUNY, For CUNY: How Open Access Can Benefit Everyone at CUNY (and Beyond)
Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health...
-
Upload
joan-lindsey -
Category
Documents
-
view
220 -
download
0
Transcript of Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health...
![Page 1: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/1.jpg)
Introduction to Spatial Regression
Glen Johnson, PhDLehman College / CUNY School of Public Health
![Page 2: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/2.jpg)
Typical scenario:
• Have a health outcome and covariables aggregated at a common geographic level, such as counties, census tracts, ZIP codes …
• Want to measure association between the outcome and the covariables.
• Specific Question is: Are there variables that co-vary spatially with the outcome variable ?
![Page 3: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/3.jpg)
Benzene in ambient air Smoking rate
Lung Cancer Rates (observed)
+ … + ? =
+
+ residual
![Page 4: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/4.jpg)
2
For 1, ,
or, more generally, ( [ ])
and ~ (0, )
with cov( , ) 0 for all ,
ˆwhere
i i i
i i
i
i j
i i i
i n
y
g E y
iid
i j
y y
x β
x β
Consider the linear model:
This is the point of departure.
![Page 5: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/5.jpg)
When applying regression modeling to spatial units that are connected in space (lattice data), the critical assumption that residuals are independently distributed with constant variance is typically violated.
Tobler’s First Law of Geography: Things closer in space tend to be more similar than things further apart
![Page 6: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/6.jpg)
When we model the expected value, E[y], as a function of spatially-varying covariates, it is possible that we may explain all of the spatial variation of the observed response, y, with the covariates, leaving uncorrelated residuals.
When this is not the case, as is typical, the assumption of iid residuals is violated and we will obtained biased estimates of the variance – typically biased downward, leading to underestimating our standard errors and concluding that some covariates are significant when in fact they are not.
![Page 7: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/7.jpg)
Tests for spatial autocorrelation should be applied to residuals if a “conventional” regression model is applied.
This may be done with various software packages or GIS add-ons.
A common statistic is Moran’s I, which equals
1 1
2
1 1 1
( )( )
1( )
n n
ij i ji j
n n n
i iji i j
w Y Y Y Y
IY Y w
n
![Page 8: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/8.jpg)
When residual spatial autocorrelation is present, several approaches may be taken to adjust for it.
The simplest is to add a fixed effect dummy variable to allow the model intercept to change with spatial location. For example, an adjustment is made that depends on a county of membership.
i i c iy x β
This is essentially stratifying the analysis by locationAnd can be done with any statistical software
![Page 9: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/9.jpg)
Since spatial location is a proxy for unobserved randomly varying covariables, it is more correctly treated as a random effect in a mixed effect model, such as
2
[ | ( )] ( )
where S(i) ~ N(0, )
i i
s
E y S i S i
x β
Which can be solved for through pseudo-likelihood methods, using software likePROC GLIMMIX or PROC MIXED in SAS, orR with appropriate library (?)
![Page 10: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/10.jpg)
![Page 11: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/11.jpg)
Illustration: Community Teen Pregnancy Rates vs. Socioeconomic
Status and Demographic Composition
![Page 12: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/12.jpg)
For each ZIP code: Response (i.e. Teen Pregnancy cases)
Predictors:• % pop. > age 24 w/ 4-year or greater college
degree
• % single-parent households out of households w/ at least one child < 18 years old
• % of tot. pop. that is Black Alone
• % of tot. pop. that is Hispanic, regardless of race
• % of tot. pop. that is a foreign-born naturalized citizen
• % of tot. pop. with income below poverty
Population at Risk
County (crude indicator of neighborhood effect)
![Page 13: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/13.jpg)
Teen Preg rate vs Education
0
50
100
150
200
250
300
350
0 20 40 60 80 100
% adults with 4-yr college degree
Pre
gn
anci
es p
er 1
000
fem
ales
ag
e 15
-19
Teen Preg rate vs Single-Parent Households
0
50
100
150
200
250
300
350
0 10 20 30 40 50
% households with one parent at home
Pre
gn
anci
es p
er 1
000
fem
ales
ag
e 15
-19
Teen Preg rate vs % Immigrants
0
50
100
150
200
250
300
350
0 5 10 15 20 25 30 35
% forein-born naturalized citizens
Pre
gn
an
cie
s p
er
10
00
fe
ma
les
a
ge
15
-19
. . .
![Page 14: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/14.jpg)
Teen Preg rate vs Race
0
50
100
150
200
250
300
350
0 20 40 60 80 100
% black alone (regardless of hispanic ethnicity)
Pre
gn
anci
es p
er 1
000
fem
ales
ag
e 15
-19
Teen Preg rate vs % Hispanic
0
50
100
150
200
250
300
350
0 20 40 60 80 100
% hispanic (regardless of race)
Pre
gn
anci
es p
er 1
000
fem
ales
ag
e 15
-19
![Page 15: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/15.jpg)
The Model …
For i = 1, …,n ZIP codes, let
yi = observed caseload
ni = population at risk
{x1, …, xp}i = community predictors
{β1, …, βp} = coefficients
Li = location effect, arising from a random process such that Li
~ N(0, σL2)
Then, the expected value of yi, given {x1, …, xp, L}i =
E[yi| {x1, …, xp, L}i ] = niexp(β1x1i + … + βp xpi + Li)
![Page 16: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/16.jpg)
• Values for the unknown coefficients {β1, …, βp, σL2
} are estimated with SAS PROC GLIMMIX, assuming yi
arose from a Poisson random process, conditional on location.
• … thus allowing risk adjusted estimates of caseload for each ZIP code.
• Incorporating the “location effect”- adjusts for unidentified covariables that co-vary spatially with the response, thus reducing residual spatial autocorrelation and potential confounding- also provides a “smoothing” effect, in that the predicted caseload is adjusted towards a common local value
![Page 17: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/17.jpg)
Teen Pregnancy Association with Select Covariables
No Spatial Effect with Spatial Effect
coefficient name estimate p-value estimate p-value
intercept -3.423 <0.0001 -3.262 <0.0001
% adults w/ Bachelors -0.016 <0.0001 -0.018 <0.0001
% Black Alone 0.008 <0.0001 0.01 <0.0001
% Hispanic 0.009 <0.0001 0.012 <0.0001
% Foreign Born 0.003 0.2884 0.002 0.5906
% single-parent households 0.04 <0.0001 0.027 <0.0001
model parameters
scale 0.166 0.009
chi-square / d.f. 1.13 0.91
-2 log likelihood 7934.5 2706.6
Residual Spatial Autocorrelation (Moran's I) 0.92 0.31
![Page 18: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/18.jpg)
Deviation of Observed from Model-Predicted Teen Pregnancy Rates
(3-Year Average for the Year 2005)
No Spatial Correction
Moran's I = 0.92
county boundaries
Pearson ResidualsNo Spatial Correction
9.6 - 23
3.8 - 9.5
0.74 - 3.7
-1.2 - 0.73
-4.4 - -1.3
New York City
April, 2009
![Page 19: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/19.jpg)
Deviation of Observed from Model-Predicted Teen Pregnancy Rates
(3-Year Average for the Year 2005)
with Spatial Random Effect
Moran's I = 0.31
county boundaries
Pearson Residualswith Spatial Random Effect
2.3 - 8.0
0.83 - 2.2
0.0093 - 0.82
-0.62 - 0.0092
-2.1 - -0.63
New York City
April, 2009
![Page 20: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/20.jpg)
![Page 21: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/21.jpg)
![Page 22: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/22.jpg)
![Page 23: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/23.jpg)
![Page 24: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/24.jpg)
Other approaches include …
A spatial lag model, where
i ij j i ij
y w y x β
and a spatial error model, where
i ij j i ij
y w x β
for a spatial autoregressive coefficient ρ.
These two models differ by whether the adjustment is made by a weighted sum of the response variable or the residuals.
![Page 25: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/25.jpg)
The spatial lag and spatial error models can be solved for in Geoda, a simple, well supported freeware found at
http://geodacenter.asu.edu/
… but only for gaussian responses.
For generalized linear models (i.e. Poisson and logistic regression), see R with appropriate libraries
http://www.r-project.org/
![Page 26: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/26.jpg)
Another approach is hierarchical modelling, which treats the response as conditional on the weighted average of local neighborhood errors.
![Page 27: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/27.jpg)
Frequentist solutions exist, but these hierarchical models lend themselves well to a fully Bayesian solution, as used by many geographic epidemiologists
Main advantages include
* flexibility offered by Generalized Linear Mixed Models
* obtain full distribution of possible outcomes - allows many ways to view the outcome (mean, median, percentiles)
- inference based on actual probability distributions, instead of confidence intervals
Main limitation is level of conceptual difficulty; however, implementation is accessible through free software …
WINBUGS (Bayesian Inference Using the Gibbs Sampler)
![Page 28: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/28.jpg)
1. Define a likelihood for the observations , where 1,..., regions :
~ Poisson( ), where
is the calculated expected value ai i i i i
i
y i ni
y E
E
nd is the relative risk
ˆ (note: the max. likelihood estimate of is / , the SIR)
i
i i i iy E
i1
ij
2. Link the Poisson expectation to both fixed and random effects:
log( ) log( )
for a common mean , fixed effect covariates with
ku s
i j ij i ij
E x
x
coefficients ,
and random effects (components of variance) due to
unstructured and spatially structured sources of variation
j
u si i
4 4 2u
2
3. Assign prior probability distributions to parameters in the linear model
~ N(0, 10 ), ~ N(0, 10 ) for all , ~ N(0, )
and [ | ] ~ ( , ) for spatial neighborii
uj i
s ssi i
j
N
hood i
2 2
4. Assign hyperprior distributions to the hyperparameters
1 / ~ Gamma(a,b) and 1 / ~ Gamma(c,d)u u s s
A Hierarchical Model
![Page 29: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/29.jpg)
is distributed conditionally on location, such that
2
22
[ | ] ~ N( , )
and
i
i
i
i i
si i
ij jj
iij ij
j j
w
w w
Focus on the random effect that captures local spatial autocorrelation
si
![Page 30: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/30.jpg)
for(i IN 1 : n)
X(covariate)
beta
tau.s
tau.u
epsilon.u
epsilon.s[i]
alpha
E[i]
mu[i]
y[i]
A Directed Acyclic Graph of the Bayesian Model
![Page 31: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/31.jpg)
Gibbs sampling basic procedure
- All stochastic parameters in the model are assigned an initial value (somewhat arbitrarily).
- The values for each parameter are updated by random simulation from a conditional probability distribution, given all other parameters in the model.
- After all terms have been updated, completing one cycle (of what is called a Markov Chain), the cycle is repeated.
- After many iterations, the simulated values for each term converge to a stationary posterior distribution (further iterations don’t change the distribution)
Estimation and inference can then be made from these posterior distributions
For example, a simulated sample of 1000 fitted SIR values (μi / Ei) can be used to yield a point estimate (typically the median)and an interval estimate, such as the 95 %-tile range (credible set)
![Page 32: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/32.jpg)
SIR
0.00
0.02
0.04
0.06
0.08
0.10
0.12
50th %-tile5th %-tile 95th %-tile
![Page 33: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/33.jpg)
An illustration for geospatial analysis of prostate cancer incidence in New York State, USA …
![Page 34: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/34.jpg)
Prostate Cancer Incidence by ZIP codeadjusted for age and raceNew York State1994-1998
![Page 35: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/35.jpg)
SIRhat[25] sample: 1000
0.4 0.6 0.8 1.0
0.0
2.0
4.0
6.0
SIRhat[26] sample: 1000
0.6 0.7 0.8 0.9 1.0
0.0 2.0 4.0 6.0 8.0
SIRhat[27] sample: 1000
0.8 1.0 1.2 1.4
0.0 2.0 4.0 6.0 8.0
SIRhat[28] sample: 1000
0.8 1.0 1.2
0.0
2.0
4.0
6.0
SIRhat[29] sample: 1000
0.6 0.8 1.0 1.2
0.0
2.0
4.0
6.0
SIRhat[30] sample: 1000
0.6 0.8 1.0 1.2
0.0
2.0
4.0
6.0
Example Output: Posterior Kernel Densities of Prostate Cancer Incidence
(`94-`98) for Some Manhattan ZIP Codes
![Page 36: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/36.jpg)
![Page 37: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/37.jpg)
![Page 38: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/38.jpg)
![Page 39: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/39.jpg)
some references
• Waller, L.A. and Gotway, C.A. 2004. Applied Spatial Statistics for Public Health Data. Wiley. 494 pp.
• Johnson, G.D. 2004. Smoothing Small Area Maps of Prostate Cancer Incidence in New York State (USA) using Fully Bayesian Hierarchical Modelling. International Journal of Health Geographics 2004, 3:29 ( http://www.ij-healthgeographics.com/content/3/1/29 )
• Elliot, P., Wakefield, J.C., Best, N.G. and Briggs, D.J. 2000. Spatial Epidemiology: Methods and Applications. Oxford. 475 pp.
• Statistics in Medicine. 2000. Vol. 19 (special issue on disease mapping)
• Lawson, A. et al. 1999. Disease Mapping and Risk Assessment for Public Health. Wiley. 482 pp.
![Page 40: Introduction to Spatial Regression Glen Johnson, PhD Lehman College / CUNY School of Public Health glen.johnson@lehman.cuny.edu.](https://reader030.fdocuments.net/reader030/viewer/2022032703/56649d0e5503460f949e41ff/html5/thumbnails/40.jpg)
GeoDa
http://geodacenter.asu.edu/
(with links to R and R-Geo)
WINBUGS for Bayesian Modeling
http://www.mrc-bsu.cam.ac.uk/bugs/welcome.shtml
Both of these freewares are supported by large international community with active listserves
Method and Software Sources