Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

45
Model Based Geostatistics Archie Clements University of Queensland School of Population Health

Transcript of Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Page 1: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Model Based Geostatistics

Archie Clements

University of Queensland

School of Population Health

Page 2: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Overview• Introduction to geostatistics

– Assumptions– Variogram components– Variogram models– Kriging– Assumptions

• Model-based geostatistics– Principles– Building the model– Prediction– Validation

• Applications: parasitic disease control in Africa

Page 3: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Spatial variation

Z

Y

X

Page 4: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

First and second order variation

• First-order variation:– Trend– Large-scale variation– Can be due to large-scale environmental drivers (e.g. temperature

for vector-borne diseases)

• Second-order variation:– Localised variation: clustering– Modelled using geostatistics

Page 5: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Spatial dependence

• Observations close in space are more similar than observations far apart

• The variance of pairs of observations that are close together (small h) tends to be smaller than the variance of pairs far apart (large h)

• Basis of the semivariogram– Spatial decomposition of the sample variance

Page 6: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Semivariance: statistical notation

Function of distance (and direction); distance in bins, direction in sectors of compass – “azimuth”

Semivariance is half the average squared difference of values observed at locations separated by a given distance (and direction)

Page 7: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Modelling spatial correlation: semivariogramS

emiv

aria

nce

Lag (h)

Nugget

Partial Sill

Sill

Page 8: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Nugget

• Random variation (white noise); non-spatial measurement error

• Microvariation (spatial variation at a scale smaller than the smallest bin)

• If no spatial correlation:– Nugget = sill (flat semivariogram)

Page 9: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Semivariogram: decisions to be made

• How many/what sized bins?– Depends on density of data points– For regular-spaced (grid-sampled) data bin size = size of

cells in the grid– For irregular sampling – modify according to range of

spatial correlation (big range, big bins; small range, small bins)

• What maximum lag(h) to use?– Should be estimated up to half the length of the shortest

side of study area• Which parametric model to use?

– Visual fit– Statistical fit

Page 10: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Variogram models

Page 11: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Schistosoma mansoni, Uganda

Omnidirectional semivariograms

Page 12: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Anisotropy• Spatial dependence is different

in different directions– Semivariogram calculated in one

direction is different from semivariogram calculated in another direction

– Should check for anisotropy and, if present, accommodate it in interpolation

– Range or sill (or both) can differ

Page 13: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Schistosoma mansoni, Uganda: directional semivariograms

DirectionRange

(km)Sill Nugget

Omni-directional

43.4 7E-2 4E-2

0˚ 39.4 1E-1 -3E-3

45˚ 43.6 7E-2 2E-2

90˚ 35.8 8E-2 3E-2

135˚ 39.5 1E-1 2E-2

Page 14: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Schistosoma haematobium, Northwestern Tanzania

Direction Range (km)

Sill Nugget

Omni-directional

36.0 5E-2 0

0˚ 260.1 2E-2 3E-2

45˚ 163.9 6E-3 3E-2

90˚ 56.2 5E-2 0

135˚ 97.7 3E-2 7E-3

Page 15: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Schistosoma haematobium, Northwestern Tanzania

Page 16: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Trended and skewed data• Data should be de-trended

– Polynomials (regression on XY coordinates)– Generalised linear models (regression on covariates)– Generalised additive models (can over-fit)– If directional variograms are calculated & range in one

direction is >3 X range in perpendicular, sign of trend

• If skewed, consider transformation (e.g. log transformation, normal score transformation)– Otherwise, extreme values overly influence interpolated

map– Have to back-transform interpolated values– Called “disjunctive Kriging”

Page 17: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Non-stationarity

• Spatial correlation structure cannot be generalised to the whole study area

• Why does it occur?– Different factors may operate in different parts of the

study area– Different ecological zones with different disease

epidemiology• Need to estimate the spatial correlation structure separately

in each homogeneous zone

Page 18: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

KrigingZ(si) is the measured value at the ith location

λi is the weight attributed to the measured value at the ith location (calculated using semivariogram)

So is the prediction location

For formulae on how the weights are estimated using the variogram:http://en.wikipedia.org/wiki/Kriging

Prediction standard error/variance gives an indication of precision of the prediction

Page 19: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Geostatistics summary

• Geostatistics involves 3 steps:– Exploratory data analysis– Definition of a variogram– Using the variogram for interpolation (Kriging)

• Technique applicable for:– Point-referenced data– Spatially continuous processes:

• Disease risk• Rainfall, elevation, temperature, other climate variables• Wildlife, vegetation, geology (mineral deposits)

Page 20: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Bayesian model-based geostatistics

Seminal paper:Diggle, Tawn and Moyeed (1998). Model-based geostatistics. Appl.

Stat. 47:3;299-350

Observed a need for addressing non-Gaussian observational error

Idea is “to embed linear Kriging methodology within a more general distributional framework”

Generalised linear models with an unobserved Gaussian process in the linear predictor

Implemented in a Bayesian framework

Page 21: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Advantages of the Bayesian approach

• Natural framework for incorporation of parameter uncertainty into spatial prediction– Can build uncertainty into parameters using

priors• Non-informative

• Informative (based on exploratory analysis, additional sources of information)

• Convenient for modelling hierarchical data structures

Page 22: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Bayesian model-based geostatistics

Page 23: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Predictions

• Can predict at specified validation locations (with observed outcomes for comparison)

• Can predict at non-sampled locations, e.g. a prediction grid

• Might be interested in – outcome– spatial random effect– Standard error of predicted outcome

Page 24: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Validation• Jack-knifing; sampling with replacement

– Remove one observation, do prediction at that location and store predicted value

– Repeat for all observations– Compare predicted to observed using statistical measures of fit

(RMSE) and discriminatory performance (AUC)– Not feasible with MBG other than with v. small datasets

• Cross-validation; sampling without replacement– Set aside a subset for validation (ideally 50%)– Use remaining data to “train” model – Compare predicted and observed for the validation subset using

statistical measures– Can then recombine the validation and training subsets for final

model build• External validation: using other prospective or retrospective

dataset

Page 25: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Model-based geostatistics summary

Model-based geostatistics involves:1. Visual and exploratory data analysis

2. Variography (to determine if there is second-order spatial variation)

3. Variable selection (for deterministic component)

4. Building model (e.g. in WinBUGS)

5. Model selection (e.g. using DIC)

6. Prediction and validation

Page 26: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Application: Schistosomiasis in Sub-Saharan

Africa

Page 27: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Schistosomiasis

779 million people at risk

207 million infected

Most in Africa

Significant illness and mortalityTwo main forms in Africa:

Urinary schistosomiasis caused by Schistosoma haematobium

Intestinal schistosomiasis caused by S. mansoni

Page 28: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Life cycle of Schistosoma haematobium

Cercariae releasedAdult worm in human bladder wall

Sporocysts in snail Eggs in urine

×Miracidia

Page 29: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Diagnosis of infection

S. haematobium:

Microscopic examination of urine slides: Presence of eggs and egg counts

Macrohaematuria (visible blood)

Microhaematuria (invisible blood) – tested using chemical reagent strips

Blood in urine questionnaire

S. mansoni and soil-transmitted helminths:

microscopic examination of stool samples

Page 30: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

School-based control programmes• School-aged children have highest prevalence

(proportion infected) and intensity (severity) of infection • Education system is convenient for control; central

location to access target population

Page 31: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

How do we determine which schools should be targeted?

• No surveillance

• Need to do surveys

• World Health Organisation guidelines: treat communities biannually where prevalence in school-age children is >10% and annually where prevalence >50%

Page 32: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Field survey: northwest Tanzania

Lake Victoria

153 schools surveyed

60 children per school

What about non-sampled locations? Need to predict (interpolate) values

Page 33: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

MBG model for S. haematobium prevalence

),(~ iii pnbinomialY

iiiii LSTLSTrain 21

iiplogit )(

)(exp);( ijiji ddf

Page 34: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Variable Coefficient Odds Ratio

Intercept 1.9 (-2.3 - 10.3) –

LST >35-39C 0.4 (-0.3 - 1.1) 1.5 (0.8 - 2.9)

LST >39C 0.3 (-1.5 - 2.2) 1.4 (0.2 - 8.6)

Rainfall >1050mm -1.1 (-3.4 - 1.1) 0.3 (3.3 x 10-2 - 3.1)

к 0.9 (0.6 - 1.3) –

φ 0.2 (0.1 - 1.0) –

S. haematobium model results

Page 35: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Clements et al. TMIH 2006

Page 36: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Uncertainty

Lower bound: 95% PI

Upper bound: 95% PI

Page 37: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Probability that prevalence is >50%Clements et al. EID 2008

Co-ordinated surveys in 3 contiguous countries•418 schools•>26,000 children

Variable Mean (95% CI) SD

Sex: Female 0.70 (0.65, 0.76) 0.03

Age: 9–10 years 1.16 (1.00, 1.33) 0.08

Age: 11–12 years 1.51 (1.31, 1.73) 0.10

Age: 13–16 years 1.79 (1.53, 2.06) 0.14

Distance to perennial water body 0.34 (0.21, 0.54) 0.08

Land surface temperature 0.80 (0.51, 1.21) 0.18

Land surface temperature2 1.10 (0.85, 1.40) 0.14

Rate of decay of spatial correlation 2.03 (1.48, 2.74) 0.32

Variance of the spatial random effect (sill) 7.03 (5.36, 9.31) 1.01

Page 38: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Slide 38

#

#

#

#

#

#

#

#

#

#

# #

##

#

#

#

#

## #

#

#

###

#

#

#

#

#

#

#

#

#

#

##

##

#

#

#

#

#

#

#

# #

#

#

#

#

#

## #

#

##

##

#

#

#

#

# #

#

#

#

#

# #

#

#

###

#

#

#

#

###

##

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

###

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

####

#

#

#

#

#

#

#

##

#

##

##

#

##

#

#

#

#

##

#

##

#

#

#

#

#

##

#

#

# #

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

## #

#

#

#

#

#

#

#

#

#

#

#

#

##

#

#

# #

#

##

#

#

#

#

#

#

#

#

#

#

#

##

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

#

# #

#

#

#

#

#

100 0 100 200 300 Kilometers

Large water bodiesCountry borders

Infection statusNo infectionS. mansoni monoinfectionHookworm monoinfectionCoinfection

N

KENYA

TANZANIA

UGANDA

Lake Victoria

LakeAlbert

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70 80 90 100

% infected

Per

cent

ag

e of

sch

ool

s

S. mansoni mono-infectionHookworm mono-infectionCo-infection

East Africa: Brooker and Clements, Int. J. Parasitol., in press

S. mansoni mono-infection: 7.9%

Hookworm mono-infection: 40.5%

Co-infection: 8.1%

Other outcomes: co-infection

Page 39: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Model for co-infection

)(exp);( ijijik ddf

kijk

ijkijk np

,

Yijk~Multinomial(pijk,nijk),

T

kNikNijkNkkijk xlog

,1

)(

Page 40: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Variable

S. mansoni mono-infection

posterior mean (95% posterior CI)

Hookworm mono-infection

posterior mean (95% posterior CI)

S. mansoni/hookworm co-infection

posterior mean (95% posterior CI)

Intercept -3.8 (-4.7 - -2.9) -0.6 (-1.1 - -0.3) -4.4 (-5.0 - -3.7)

OR: Elevation 0.35 (0.22 - 0.58) 0.77 (0.65 - 0.89) 0.30 (0.20 - 0.47)

OR: DPWB 0.23 (0.10 - 0.45) 0.94 (0.76 - 1.15) 0.30 (0.18 - 0.58)

OR: Rural vs urban 0.43 (0.21 - 0.79) 0.98 (0.68 - 1.37) 0.61 (0.36 - 1.02)

OR: Ext. rural vs urban 0.62 (0.23 - 1.44) 1.16 (0.82 - 1.81) 0.75 (0.31 - 1.62)

OR: LST 0.88 (0.62 - 1.25) 0.60 (0.50 - 0.72) 0.57 (0.31 - 0.87)

OR: Female 0.86 (0.76 - 0.96) 0.91 (0.86 - 0.97) 0.70 (0.63 - 0.77)

OR: Age (9-10 years) 1.67 (1.37 - 2.06) 1.17 (1.04 - 1.30) 1.82 (1.52 - 2.21)

OR: Age (11-13 years) 2.44 (2.06 - 2.89) 1.55 (1.39 - 1.71) 2.99 (2.55 - 3.52)

OR: Age (≥14 years) 2.87 (2.19 - 3.71) 1.88 (1.63 - 2.14) 3.83 (3.01 - 4.86)

Phi (rate of decay) 3.52 (1.73 - 7.21) 4.98 (3.38 - 7.33) 3.76 (2.10 - 7.36)

Sill 6.39 (3.52 - 11.78) 1.31 (0.98 - 1.76) 6.34 (3.98 - 9.95)

Page 41: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Co-infection

S. mansoni monoinfection

Hookwormmonoinfection

S. mansoni - Hookworm coinfection

Page 42: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Other outcomes: Intensity of infection

Prevalence is used (currently) for disease control planningIntensity of infection (eggs/ml urine or /g faeces) is more indicative of:

Morbidity (anaemia, urine tract, hepatic pathology)Transmission

Page 43: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Model for intensity of infection

)(~ ijij munegbinY

jjjj elevdist

)(exp);( jkjkj ddf

jijij girlmulog )(

Page 44: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Intensity of S. mansoni

infection, East Africa

Slide 44

Clements et al. Parasitol 2006

Variable Posterior Mean (95% CI)

Intercept 10.06 (5.77 - 13.22)

Female -0.41 (-0.72 - -0.11)

Elevation (m) -0.007 (-0.01 - -0.004)

DPWB (dec deg) -5.36 (-7.51 - -3.30)

Sill 23.96 (19.06 - 32.07)

Range 0.134 (0.09 - 0.20)

Overdispersion 0.06 (0.058 - 0.062)

Page 45: Model Based Geostatistics Archie Clements University of Queensland School of Population Health.

Conclusions

• In disease control we need evidence-based framework for deciding on where to allocate limited control resources

• Maps are useful tools for highlighting sub-national variation; targeting interventions; advocacy (national and local); integrated control programmes; estimating heterogeneities in disease burden

• Model-based geostatistics enables rich inference from spatial data; uncertainty