Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta [email protected]...

39
Summary of “A Spatial Scan Statistic” by M. Kulldorff Presented by Gauri S. Datta [email protected] Mid-Year Meeting February 3, 2006

Transcript of Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta [email protected]...

Page 1: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Summary of “A Spatial Scan Statistic” by M. Kulldorff

Presented by Gauri S. [email protected]

Mid-Year MeetingFebruary 3, 2006

Page 2: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Background• Scan Statistic

– A tool to detect cluster in a Point Process– Naus (1965 JASA) studied in one dimension– tests if a 1-dim point process is purely random

• Point Process– Consider a time interval [a,b] and a window

A=[t,t+w] of fixed width w– (A)= # of e-mails arrived in the time window A– n(A) ´ nA = # of junk e-mails = number of “points”– Arrival times of junk e-mails define a “Point

Process”

Page 3: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Main Idea in Scan Statistic

• Move a window [t,t+w] of size w < b-a over a time interval [a,b]

• Over all possible values of t, record the maximum number of points in the window

• Compare this number with cut off points under the the hypothesis of a purely Poisson Process

Page 4: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.
Page 5: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

p

p

q

Page 6: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Building block of Scan Test

• Repeated use of tests for equality of two Binomial or Poisson populations

• Two populations are defined by the scanning window A and its complement Ac

• As in multiple comparison, these tests are dependent as one moves the scanning window

Page 7: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Spatial Scan Statistic (SSS)

• Kulldorff (1997) used SSS to detect clusters in spatial process

• SSS can be used – In multi-dim point process– With variable window size– With baseline process an inhomogeneous

Poisson process or Bernoulli Process

Page 8: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

SSS (continued)

– Scanning window can be any predefined shape

– SSS is on a geographical space G with a measure

– In traditional point process, G is a line, is a uniform measure

– In 2-dim, G is a plane, a Lebesgue measure

Page 9: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

p

p

q

Page 10: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Examples

• Forestry: – Spatial clustering of trees. – Want to see for clusters of a specific kind of

trees after adjusting for uneven spatial distribution of all trees

– (A)=Total # of trees in region A

– nA=# of trees in A of specific kind

Page 11: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Examples (continued)

• Epidemiology– Interest in detecting geographical clusters of

disease– Need to adjust for uneven population density

• Rural vs. urban population

– For data aggregated into census districts, measure is concentrated at the central coordinates of districts

Page 12: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Examples (continued)

• If interest is in space-time clusters of a disease, the measure will still be concentrated in the geographical region as in the prior example

• Adjusting for uneven population distribution is not always enough. Should take confounding factors into account. E.g., in epidemiology measure can reflect standardized expected incidence rate

Page 13: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

SS = LR statistic

• For a fixed size window, scan statistic is the maximum # of points in the window at any given time/geographical region

• Test Stat is equivalent to LR test statistic for testing H0:1

=2 vs. H

a:

1>

2

• Generalization to LR test is important for variable window

Page 14: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Generalized SS: Notation/Models

• G= Geographical area / study space• A= Window ½ G• N(A)= Random # of points in A

– A spatial point process• Goal to find the prominent cluster

• Two useful models for point process– (a) Bernoulli model– (b) Poisson model

Page 15: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Standard Models for SS

• For Bernoulli model, measure is such that (A) is an integer for all subsets A of G– Two states (disease “point” or no disease) for

each unit

• Location of the points define a point process

Page 16: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.
Page 17: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

LR Test: Bernoulli Model

Page 18: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

LR Test: Bernoulli Model

Page 19: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Poisson Model

• Under Poisson model, points generated by inhom. Poiss. Proc. There is exactly one zone Z G s.t. N(A) Po(pµ(AZ) + qµ(AZc)) for all A.

• Null hypothesis H0:p=q

• Alternative hypo H1: p>q, Z .

• Under H0, N(A) Po(pµ(A)) for all A.

• - the parameter Z disappears under H0

Page 20: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Poisson Model (continued)

Page 21: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Poisson Model (continued)

Page 22: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Poisson Model (continued)

Page 23: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Choice of Zones

• How is selected? Possibilities:(1) All circular subsets(2) All circles centered at any of several foci on

a fixed grid, with a possible upper limit on size

(3) Same as (2) but with a fixed size(4) All rectangles of fixed size and shape(5) If looking for space-time clusters, use

“cylinders” scanning circular geographical areas over variable time intervals

Page 24: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Bernoulli vs. Posson Model

• Choice between a Bernoulli or Poisson model does not matter much if

n(G) << (G)

In other cases, use the model most appropriate for application

Page 25: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

A Useful Result

An important result on most likely cluster based on these models is given in the paper. It states that as long as the points within the zone constituting the most likely cluster are located where they are, H_0 will be rejected irrespective of the other points in G. If a cluster is located in Seattle, locations of the points in the east coast of U.S. do not matter (Theorem 1)

Page 26: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Computations and MC

• To find the value of λ, we need to calculate LR maximized over collection of zones in H1. Seems like a daunting task since # of zones could be infinite.

• # of observed points finite

• For a fixed # of points, likelihood decreases as µ(Z) increases

Page 27: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Computations (cont’d)

• If the circle size increases for a fixed foci, need to recalculate likelihood whenever a new point enters the circle. For a finite points, # of recalc’ing likelihood for each foci is finite.

• Distribution of λ is difficult. MC simulation used to generate histogram of λ . Under H0, replicate the data sets conditional on nG .

Page 28: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Application of SSS to SIDS

• Bernoulli and Poisson models are illustrated using the SIDS data from NC

• For 100 counties in NC, total # of live births and # of SIDS cases for 1974-84.

• Live births range from 567 to 52345

• Location of county seats are the coordinates. Measure is the # of live births in a county

Page 29: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Application to SIDS (continued)

• Zones for scanning window are circles centered at a county coordinate point including at most half of the total population

• Zones are circular only wrt the aggregated data. As circles around a county seat are drawn, other counties will either be completely part of a zone or else not at all, depending on whether its county seat is within the circle or not

Page 30: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Bernoulli model for SIDS

• Bernoulli model is very natural. Each birth can correspond to at most one SID. Table 1 summarizes the results of the analysis.

• From Figure 1, the most likely cluster A, consists of Bladen, Columbus, Hoke, Robeson, and Scotland.

• Using a conservative test, a secondary cluster is B, consists of Halifax, Hartford and Northampton counties.

Page 31: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Poisson model for SIDS

• For a rare disease SIDS, Poisson model gives a close approximation to Bernoulli. Results are reported in Table 1

• Both models detect the same cluster

• P-values for the primary cluster are same for both the models; p-values for the secondary cluster are very close

Page 32: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Application to SIDS (continued)

Page 33: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Two significant clusters based on SSS

Page 34: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

SSS adjusted for Race

• For SIDS one useful covariate is race

• Race is related to SIDS through unobserved covariates such as quality of housing, access to health care

• Overall incidence of SIDS for white children is 1.512 per 1000 and for black children is 2.970 per 1000.

Page 35: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

SSS: race-adjusted (continued)

• Racial distribution differs widely among the counties in NC

• This analysis leads to the same primary cluster (see Figure 2)

• Previous secondary cluster disappeared but a third secondary cluster C emerges. Cluster C consists of a bunch of counties in the western part of the state

Page 36: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Application to SIDS (continued)

Page 37: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

SSS to SIDS adjusted for race

Page 38: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

A Bayesian alternative to SSS• Scott and Berger (2006): Idea of Bayesian multiple testing.

• Observe Xj N(µj, σ2), j=1,…,M,

• To determine which µj are nonzero we have M (conditionally) independent tests, each testing

H0j:µj = 0 vs. H1j: µj ≠ 0

• p0 = prior probability that µj is zero

• Crucial point here: let data estimate p0 .

• S&B use the hierarchical model 1. Xj|µj , σ2, γj ~ N(γjµj, σ2), independently 2. µj | τ2 ~ I.I.D. N(0, τ2 ), γj |p0 ~ I.I.D. Bern (1-p0) 3. (τ2 , σ2) ~ π (τ2 , σ2) =(τ2 + σ2)-2, p0~ π(p0) Several choices for π(p0): Uniform, Beta(a,1)S&B computed posterior probability γj =1.

Page 39: Summary of A Spatial Scan Statistic by M. Kulldorff Presented by Gauri S. Datta gauri@stat.uga.edu Mid-Year Meeting February 3, 2006.

Modification of S&B Model

• Assume Xj N(µj, σ2), j=1,…,M,

• To determine which µj are positive we have M (conditionally) independent tests, each testing

H0j:µj = 0 vs. H1j: µj > 0

• As before 1. Xj|µj , σ2, γj ~ N(γjµj, σ2), independently 2. µj | µ(-j), ρ, τ2 ~ N(ρ∑qjkµk, τ2 ), [CAR] γj |pj ~ Ind. Bern (1-pj) 3. (τ2, σ2, ρ) ~ π (τ2 , σ2, ρ) =(τ2 + σ2)-2

4. CAR model on logit(pj) Compute posterior probability of µj >0.