Statistical modelling and inference for spatio-temporal ...mhazelto/talks/IDReC-2012.pdf ·...

55
Statistical modelling and inference for spatio-temporal disease processes Martin Hazelton 1 Statistics and Bioinformatics Group Institute of Fundamental Sciences, Massey University 24 October 2012 1 Presenter: [email protected] IDReC Symposium October 2012, Palmerston North 1 / 31

Transcript of Statistical modelling and inference for spatio-temporal ...mhazelto/talks/IDReC-2012.pdf ·...

Statistical modelling and inference forspatio-temporal disease processes

Martin Hazelton1

Statistics and Bioinformatics GroupInstitute of Fundamental Sciences, Massey University

24 October 2012

1Presenter: [email protected]

IDReC Symposium October 2012, Palmerston North 1 / 31

Disease, Prediction and Chance

Spread of infectious disease is a complexprocess typically involving a plethora offactors.Some are explicable/predictable.Others are essentially random.Plenty of work for statisticians decidingwhich is which.

IDReC Symposium October 2012, Palmerston North 2 / 31

Spatial Patterns ... of Disease?

●●

●●

●●●●

●●

●●

●●

● ●

●●

● ●

A

●●

●●

●●

B

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

C

Complete spatial randomnessLocations of gastroenteritis casesMystery spatial point pattern

IDReC Symposium October 2012, Palmerston North 3 / 31

Spatial Patterns ... of Disease?

●●

●●

●●●●

●●

●●

●●

● ●

●●

● ●

Gastroenteritis cases

●●

●●

●●

Complete spatial randomness

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

Sea anemone locations

Complete spatial randomness (B)Locations of gastroenteritis cases (A)Mystery spatial point pattern (C)

IDReC Symposium October 2012, Palmerston North 3 / 31

Measuring Spatial Risk

To understand spatial patterns of disease we must adjust forunderlying population density.To this end, Bithell (1990) defined the relative risk function:

r(z) =f (z)g(z)

z ∈ R

where f is density of cases and g density of controls.Usual to work on log-scale:

ρ(z) = log{r(z)} = log{f (z)} − log{g(z)}

Bithell J.F. (1990). An application of density estimation to geographical epidemiology. Statistics in

Medicine 9:691–701.

IDReC Symposium October 2012, Palmerston North 4 / 31

Kernel Estimation

Straightforward approach to estimation of r is to replacingunknown densities by kernel estimates thereof:

r(z) =f (z)g(z)

z ∈ R

Kernel density estimate constructed from bivariate datax1, . . . ,xn:

f (z) = n−1n∑

i=1

Kh(z − x i)

I Kh(z) = h−2K (z/h), with kernel K being spherically symmetric;I h is the bandwidth, controlling degree of smoothing.

IDReC Symposium October 2012, Palmerston North 5 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

●●

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

●●

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

●●

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

●●

●●

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

●●

●●●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ● ●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●● ●●

IDReC Symposium October 2012, Palmerston North 6 / 31

Constructing Kernel Density EstimatesAn example using the gastroenteritis data

IDReC Symposium October 2012, Palmerston North 6 / 31

Smoothing of Kernel Density EstimatesSmall bandwidth (undersmoothing)

IDReC Symposium October 2012, Palmerston North 7 / 31

Smoothing of Kernel Density EstimatesSmall bandwidth (undersmoothing)

IDReC Symposium October 2012, Palmerston North 7 / 31

Smoothing of Kernel Density EstimatesSmall bandwidth (undersmoothing)

IDReC Symposium October 2012, Palmerston North 7 / 31

Smoothing of Kernel Density EstimatesSmall bandwidth (undersmoothing)

IDReC Symposium October 2012, Palmerston North 7 / 31

Smoothing of Kernel Density EstimatesLarge bandwidth (oversmoothing)

IDReC Symposium October 2012, Palmerston North 8 / 31

Smoothing of Kernel Density EstimatesLarge bandwidth (oversmoothing)

IDReC Symposium October 2012, Palmerston North 8 / 31

Smoothing of Kernel Density EstimatesLarge bandwidth (oversmoothing)

IDReC Symposium October 2012, Palmerston North 8 / 31

Smoothing of Kernel Density EstimatesLarge bandwidth (oversmoothing)

IDReC Symposium October 2012, Palmerston North 8 / 31

Example: Cancer of the Larynx in South Lancashire

Data are geographicalcoordinates for 58 casesof cancer of the larynx,and 978 controls (casesof lung cancer).Data collected between1974 and 1983 byChorley and SouthRibble Health Authorityin Lancashire, England.

● ●

●●

●●●●●

IDReC Symposium October 2012, Palmerston North 9 / 31

The Effect of Bandwidth SelectionFor the log-relative risk function

−15

−10

−5

0

h=1

−15

−10

−5

0

h=10

IDReC Symposium October 2012, Palmerston North 10 / 31

Research on Bandwidth Selection

Bandwidth selection is a challenging problem for both theoreticaland practical reasons.

I Typical inhomogeneity of populations.I Want stable estimates of risk even where data are sparse.I Asymptotic theory very delicate.

Recent progress on spatially adaptive smoothing regimens Davies& Hazelton (2010).

Davies, T.M. and Hazelton, M.L. (2010). Adaptive kernel estimation of spatial relative risk.

Statistics in Medicine 29, 2423–2437.

IDReC Symposium October 2012, Palmerston North 11 / 31

Estimation with Spatially Adaptive BandwidthsEstimation of case density

IDReC Symposium October 2012, Palmerston North 12 / 31

Estimation with Spatially Adaptive BandwidthsEstimation of case density

IDReC Symposium October 2012, Palmerston North 12 / 31

Estimation with Spatially Adaptive BandwidthsEstimation of case density

IDReC Symposium October 2012, Palmerston North 12 / 31

Estimation with Spatially Adaptive BandwidthsEstimation of case density

IDReC Symposium October 2012, Palmerston North 12 / 31

Estimation with Spatially Adaptive BandwidthsEstimation of case density

IDReC Symposium October 2012, Palmerston North 12 / 31

Estimation with Spatially Adaptive BandwidthsEstimation of case density

IDReC Symposium October 2012, Palmerston North 12 / 31

Estimation with Spatially Adaptive BandwidthsEstimation of case density

IDReC Symposium October 2012, Palmerston North 12 / 31

Estimation with Spatially Adaptive BandwidthsEstimation of log-relative risk surface

−15

−10

−5

0

IDReC Symposium October 2012, Palmerston North 13 / 31

Boundary Bias

Disease data collectedover finite region.For points near edge,some of kernel weightspills over boundary andis lost.Has significant impact onkernel estimates.Theoretical analysis isintricate.

IDReC Symposium October 2012, Palmerston North 14 / 31

Foundations of Asymptotic Analysis of Boundary Bias

IDReC Symposium October 2012, Palmerston North 15 / 31

Asymptotics with Adaptive Boundary KernelOr ... Why Postdocs are Great

bias(f (z))

h20

=[D10f (z)]2

4f (z)

[ b0

q

(a(20)

40 + 2a(11)31 + a(02)

22 + 7a(10)30 + 7a(01)

21 + 8a20

)+

b1

q

(a(20)

50 + 2a(11)41 + a(02)

32 + 9a(10)40 + 9a(01)

31 + 15a30

)+

b2

q

(a(20)

41 + 2a(12)32 + a(02)

23 + 9a(10)31 + 9a(01)

22 + 15a21

)]

+D10f (z)D01f (z)

4f (z)

[ b0

q

(2a(20)

31 + 4a(11)22 + 2a(02)

13 + 14a(10)21 + 14a(01)

12 + 16a11

)+

b1

q

(2a(20)

41 + 4a(11)22 + 2a(02)

23 + 18a(10)31 + 18a(01)

22 + 30a21

)+

b2

q

(2a(20)

32 + 4a(11)23 + 2a(02)

14 + 18a(10)22 + 18a(01)

13 + 30a12

)]

+[D01f (z)]2

4f (z)

[ b0

q

(a(20)

22 + 2a(11)13 + a(02)

04 + 7a(10)12 + 7a(01)

03 + 8a02

)+

b1

q

(a(20)

32 + 2a(11)23 + a(02)

14 + 9a(10)22 + 9a(01)

13 + 15a12

)+

b2

q

(a(20)

23 + 2a(11)14 + a(02)

05 + 9a(10)13 + 9a(01)

04 + 15a03

)]

+D20f (z)

4

[ b0

q

(2a(10)

30 + 2a(01)21 + 8a20

)+

b1

q

(2a(10)

40 + 2a(01)31 + 10a30

)+

b2

q

(2a(10)

31 + 2a(01)22 + 10a21

)]

+D11f (z)

4

[ b0

q

(4a(10)

21 + 4a(01)12 + 16a11

)+

b1

q

(4a(10)

31 + 4a(01)22 + 20a21)

)+

b2

q

(4a(10)

22 + 4a(01)13 + 20a12

)]

+D02f (z)

4

[ b0

q

(2a(10)

12 + 2a(01)03 + 8a02

)+

b1

q

(2a(10)

22 + 2a(01)13 + 10a12

)+

b2

q

(2a(10)

13 + 2a(01)04 + 10a03

)],

IDReC Symposium October 2012, Palmerston North 16 / 31

Computer Implementation

Important practical problems in spatial epidemiology can quicklylead to fascinating theoretical work.Need to make this stuff available to users.Released R package sparr. See Davies, Hazelton & Marshall(2011).

Davies, T.M., Hazelton, M.L. and Marshall, J.C. (2011). sparr: Analyzing spatial relative risk

using fixed and adaptive kernel density estimation in R. Journal of Statistical Software 39, 1–14.

IDReC Symposium October 2012, Palmerston North 17 / 31

Screenshot of R Package Sparr

IDReC Symposium October 2012, Palmerston North 18 / 31

Statistics and Bioinformatics Group Research

Methodological research with applications to infectious diseaseI Spatial statisticsI Markov chain Monte Carlo methodsI Simulation based inferenceI Smoothing methodsI Semiparametric modellingI Statistical software development

Variety of applications

IDReC Symposium October 2012, Palmerston North 19 / 31

Some Members of the Research Group

Postgraduate students

Sarojinie FernandoPhD (recently submitted)

Brigid Betz-StableinPhD (year 3)

Kate RichardsPhD (year 2)

Khair JonesPhD (year 1)

Lyndal HendenHonours

Staff with interest in infectious disease applications

Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall

IDReC Symposium October 2012, Palmerston North 20 / 31

Some Members of the Research Group

Postgraduate students

Sarojinie FernandoPhD (recently submitted)

Brigid Betz-StableinPhD (year 3)

Kate RichardsPhD (year 2)

Khair JonesPhD (year 1)

Lyndal HendenHonours

Staff with interest in infectious disease applications

Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall

IDReC Symposium October 2012, Palmerston North 21 / 31

Improved Spatio-Temporal Risk Estimation

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

Local linear versus density ratio estimation of relative risk.Application to myrtle wilt in Tasmanian myrtle beech (Nothofaguscunninghamii).

IDReC Symposium October 2012, Palmerston North 22 / 31

Some Members of the Research Group

Postgraduate students

Sarojinie FernandoPhD (recently submitted)

Brigid Betz-StableinPhD (year 3)

Kate RichardsPhD (year 2)

Khair JonesPhD (year 1)

Lyndal HendenHonours

Staff with interest in infectious disease applications

Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall

IDReC Symposium October 2012, Palmerston North 23 / 31

Modelling Progression of Eye Disease

−4

−3

−2

−1

0

(a) SPROG method

−4

−3

−2

−1

0

(b) PLR method

Using ideas from geographical disease mapping to examinespatial patterns of damage over retina.Interpretation of visual field data must account for physiology andoptics.

IDReC Symposium October 2012, Palmerston North 24 / 31

Some Members of the Research Group

Postgraduate students

Sarojinie FernandoPhD (recently submitted)

Brigid Betz-StableinPhD (year 3)

Kate RichardsPhD (year 2)

Khair JonesPhD (year 1)

Lyndal HendenHonours

Staff with interest in infectious disease applications

Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall

IDReC Symposium October 2012, Palmerston North 25 / 31

Modelling Foot and Mouth Disease in Vietnam

Data quality issues for FMD inVietnam.Appropriate level of dataaggregation for modelling?Developing methods foridentifying anomalies

I High risk provinces.I Under-reporting.

IDReC Symposium October 2012, Palmerston North 26 / 31

Some Members of the Research Group

Postgraduate students

Sarojinie FernandoPhD (recently submitted)

Brigid Betz-StableinPhD (year 3)

Kate RichardsPhD (year 2)

Khair JonesPhD (year 1)

Lyndal HendenHonours

Staff with interest in infectious disease applications

Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall

IDReC Symposium October 2012, Palmerston North 27 / 31

Bayesian Constrained Smoothing Problems

1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3

0.0

0.2

0.4

0.6

0.8

1.0

Rainfall (m)

Toxo

plas

mos

is p

ositi

ve

MonotonicUnconstrained

In some problems want flexible smooth fit under shape constraints.Application to toxoplasmosis data from 34 cities in El Salvador.

IDReC Symposium October 2012, Palmerston North 28 / 31

Some Members of the Research Group

Postgraduate students

Sarojinie FernandoPhD (recently submitted)

Brigid Betz-StableinPhD (year 3)

Kate RichardsPhD (year 2)

Khair JonesPhD (year 1)

Lyndal HendenHonours

Staff with interest in infectious disease applications

Prof Martin Hazelton Dr Chris JewellA/Prof Geoff Jones Dr Jonathan Marshall

IDReC Symposium October 2012, Palmerston North 29 / 31

Methods of Simulation Based Inference

Modern complex stochasticmodels often have intractabledistribution theory.Standard methods of inferenceinfeasible.We are working on improvingsimulation based approaches(e.g. ABC).Application is estimation ofeffective population sizes in Baliusing coalescent model.

IDReC Symposium October 2012, Palmerston North 30 / 31