Review of software for space-time disease surveillance

8
REVIEW Open Access Review of software for space-time disease surveillance Colin Robertson * , Trisalyn A Nelson Abstract Disease surveillance makes use of information technology at almost every stage of the process, from data collec- tion and collation, through to analysis and dissemination. Automated data collection systems enable near-real time analysis of incoming data. This context places a heavy burden on software used for space-time surveillance. In this paper, we review software programs capable of space-time disease surveillance analysis, and outline some of their salient features, shortcomings, and usability. Programs with space-time methods were selected for inclusion, limit- ing our review to ClusterSeer, SaTScan, GeoSurveillance and the Surveillance package for R. We structure the review around stages of analysis: preprocessing, analysis, technical issues, and output. Simulated data were used to review each of the software packages. SaTScan was found to be the best equipped package for use in an automated sur- veillance system. ClusterSeer is more suited to data exploration, and learning about the different methods of statis- tical surveillance. Introduction Disease surveillance is an ongoing process of informa- tion gathering, organizing, analyzing, interpreting, and communicating. It is the principal means by which pub- lic health information is generated and disseminated, informing policy, research, and response measures. For outbreaks of infectious disease, timely information on the spread of cases in space and time can facilitate action by public health officials [e.g., [1]]. For chronic and endemic diseases, monitoring space-time trends in disease occurrence can highlight changing patterns in risk and help identify new risk factors [e.g., [2]]. Analysis of spatial-temporal patterns in public health data is an increasingly common task for public health analysts as more surveillance data become available. Surveillance datasets are often massive in size and complexity, and the availability and quality of software capable of analyz- ing space-time disease surveillance data on an ongoing basis is integral to practical surveillance [3-5]. Geo- graphic information systems (GIS) used for disease map- ping can visualize the spatial variation in disease risk. However, statistical methods are often required to detect changes in the underlying disease process. GIS are also poorly equipped to handle temporal data [6]. In Fall of 2008, a workshop on training priorities in the use of GIS in health research conducted in Victoria, British Columbia, polled 78 researchers, graduate stu- dents, faculty, and others working in health and GIS regarding barriers to the use of space-time disease sur- veillance [7]. Training and software availability were cited as the primary barriers to the uptake of space-time disease surveillance. Currently, statistical methods for space-time disease surveillance are not included in most conventional GIS or statistical software. These methods are available in specialist cluster analysis software such as ClusterSeer http://www.terraseer.com, or as exten- sions to general statistical analysis software packages (e. g., R, S-Plus). Our goal is to provide researchers and public health analysts with a review and demonstration of software packages for space-time disease surveillance. We aim to facilitate expanded use of these methods by providing a means to quickly determine the software options and to identify the ways in which programs dif- fer. We limit our scope to methods that use both space and time, rather than purely temporal or spatial analysis. This paper is organized as follows. First, we briefly review basic classes of methods for space-time disease surveillance in the background section. Readers familiar with these methods may wish to skip ahead. Second, in the methods section we outline how we selected soft- ware to review, the review methodology and datasets * Correspondence: [email protected] Spatial Pattern Analysis & Research (SPAR) Laboratory, Dept of Geography, University of Victoria, PO Box 3060, Victoria, BC V8W 3R4, Canada Robertson and Nelson International Journal of Health Geographics 2010, 9:16 http://www.ij-healthgeographics.com/content/9/1/16 INTERNATIONAL JOURNAL OF HEALTH GEOGRAPHICS © 2010 Robertson and Nelson; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Transcript of Review of software for space-time disease surveillance

Page 1: Review of software for space-time disease surveillance

REVIEW Open Access

Review of software for space-time diseasesurveillanceColin Robertson*, Trisalyn A Nelson

Abstract

Disease surveillance makes use of information technology at almost every stage of the process, from data collec-tion and collation, through to analysis and dissemination. Automated data collection systems enable near-real timeanalysis of incoming data. This context places a heavy burden on software used for space-time surveillance. In thispaper, we review software programs capable of space-time disease surveillance analysis, and outline some of theirsalient features, shortcomings, and usability. Programs with space-time methods were selected for inclusion, limit-ing our review to ClusterSeer, SaTScan, GeoSurveillance and the Surveillance package for R. We structure the reviewaround stages of analysis: preprocessing, analysis, technical issues, and output. Simulated data were used to revieweach of the software packages. SaTScan was found to be the best equipped package for use in an automated sur-veillance system. ClusterSeer is more suited to data exploration, and learning about the different methods of statis-tical surveillance.

IntroductionDisease surveillance is an ongoing process of informa-tion gathering, organizing, analyzing, interpreting, andcommunicating. It is the principal means by which pub-lic health information is generated and disseminated,informing policy, research, and response measures. Foroutbreaks of infectious disease, timely information onthe spread of cases in space and time can facilitateaction by public health officials [e.g., [1]]. For chronicand endemic diseases, monitoring space-time trends indisease occurrence can highlight changing patterns inrisk and help identify new risk factors [e.g., [2]]. Analysisof spatial-temporal patterns in public health data is anincreasingly common task for public health analysts asmore surveillance data become available. Surveillancedatasets are often massive in size and complexity, andthe availability and quality of software capable of analyz-ing space-time disease surveillance data on an ongoingbasis is integral to practical surveillance [3-5]. Geo-graphic information systems (GIS) used for disease map-ping can visualize the spatial variation in disease risk.However, statistical methods are often required to detectchanges in the underlying disease process. GIS are alsopoorly equipped to handle temporal data [6].

In Fall of 2008, a workshop on training priorities inthe use of GIS in health research conducted in Victoria,British Columbia, polled 78 researchers, graduate stu-dents, faculty, and others working in health and GISregarding barriers to the use of space-time disease sur-veillance [7]. Training and software availability werecited as the primary barriers to the uptake of space-timedisease surveillance. Currently, statistical methods forspace-time disease surveillance are not included in mostconventional GIS or statistical software. These methodsare available in specialist cluster analysis software suchas ClusterSeer http://www.terraseer.com, or as exten-sions to general statistical analysis software packages (e.g., R, S-Plus). Our goal is to provide researchers andpublic health analysts with a review and demonstrationof software packages for space-time disease surveillance.We aim to facilitate expanded use of these methods byproviding a means to quickly determine the softwareoptions and to identify the ways in which programs dif-fer. We limit our scope to methods that use both spaceand time, rather than purely temporal or spatial analysis.This paper is organized as follows. First, we briefly

review basic classes of methods for space-time diseasesurveillance in the background section. Readers familiarwith these methods may wish to skip ahead. Second, inthe methods section we outline how we selected soft-ware to review, the review methodology and datasets

* Correspondence: [email protected] Pattern Analysis & Research (SPAR) Laboratory, Dept of Geography,University of Victoria, PO Box 3060, Victoria, BC V8W 3R4, Canada

Robertson and Nelson International Journal of Health Geographics 2010, 9:16http://www.ij-healthgeographics.com/content/9/1/16

INTERNATIONAL JOURNAL OF HEALTH GEOGRAPHICS

© 2010 Robertson and Nelson; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.

Page 2: Review of software for space-time disease surveillance

used to demonstrate software features. Third, we pre-sent the results of our review. Finally, we conclude withsome guidelines for the use of these software packagesfor public health researchers and analysts.

BackgroundStatistical approaches to disease surveillance have been thesubject of a number of texts and review papers [8-10]. Akey factor in the selection of methods of analysis is theobjective of surveillance, such as outbreak detection, trendmonitoring, case detection, or situational awareness. Addi-tional contextual factors are also important to considersuch as scale and scope of the system, disease characteris-tics, and technical considerations [11]. Methods can bebroadly categorized as either statistical tests or model-based approaches. Statistical tests are the dominant classof approaches used for outbreak detection. The aim ofmost methods is to test a subset of data, defined by spatialand temporal constraints (i.e., a window or kernel), againstan expected rate of disease occurrence over the study areaas a whole. Methods differ with respect to how the win-dow that defines each subset is constructed, how statisticalsignificance is determined, and how the baseline expecta-tion varies over space and time.The most widely used testing methods are cumulative

sum (cusum) methods and scan statistics. Briefly, cusumapproaches keep a running sum of deviations from theexpected value, and once the cumulative deviationreaches some threshold, an alarm is triggered. For space-time applications, individual cumulative sums for eacharea under surveillance are monitored and can beadjusted for spatial relationships [12]. Depending on thestatistic being monitored in the cusum, different surveil-lance objectives can be addressed. For example, a mea-sure of spatial pattern monitored in a cusum frameworkcan be sensitive to slight changes in spatial pattern whichmay signal a shift in dynamics of an endemic disease[e.g., [13]]. Scan statistics are used mostly in outbreakdetection contexts. Here, circular search windows ofvarying radii scan a map of disease and test if the numberof cases within the search area is unexpectedly high. Inthe space-time scan statistic [14,15], the search area isextended to a cylinder where the height of the cylinder isdefined by time periods of varying lengths. The mostlylikely cluster is assessed using monte carlo simulations.Modeling approaches are used mostly for adjusting the

expected number of cases (i.e., denominator) of disease.Disease incidence varies spatially with population andknown risk factors. Disease mapping models aim to esti-mate the true relative risk across the study area by incor-porating the spatial variation in these risk factors. Thestandardized mortality ratio (SMR) is the crudest mea-sure of risk, computed as the observed cases divided bythe expected in each area. The SMR is often of limited

use in surveillance because it can fluctuate widely forrare diseases or in rural areas where populations aresmall. Further, abrupt (i.e., unrealistic) changes at theboundaries of areal units are sometimes observed. Mod-els allow both covariate effects to be estimated, and forsparsely populated areas to have their expected valuesadjusted towards the mean (i.e., borrow strength). Whenused in surveillance applications, models confer thesesame advantages. Disease surveillance models have beeneither space-time Bayesian models [e.g., [16]] or general-ized linear mixed models [e.g., [17]]. Modelingapproaches are complementary to other methods as testsare still required to determine how well the most recentlyobserved data fit with the model [18]. Adjustments canalso be such that models can be refit over time to adjustto long-term changes in disease occurrence or surveil-lance effort/efficacy (e.g., improved diagnostic tests), andparameters can be included to model spatial relationshipsand seasonal and day of the week effects, common fea-tures of some types of disease surveillance data.In addition to testing and modeling methods, new

computation-based tools are also being developed forsurveillance. These approaches tend to be in eitherexperimental and/or theoretical stages or algorithmsdesigned for specific surveillance systems. Some hybridapproaches include networks [19], simulation-basedmethods [20], and space-time hidden markov models[21]. While many of these new approaches appear pro-mising, most are not yet available in software.

MethodsInclusion CriteriaSoftware programs were included for review based ontwo criteria: the program had methods that handledboth space and time, and methods were built-in to thesoftware (i.e., not requiring programming). This criteriaconstrained our review to four software packages (SaTS-can 8.0, ClusterSeer 2.3, GeoSurveillance 1.1, Surveil-lance package 1.1-2 for R). Comprehensive diseasesurveillance systems (also sometimes called health infor-mation systems) software, that include data collectionand processing routines, database components, and sys-tem-specific analysis and visualization modules wereexcluded (e.g., RODS [22], AEGIS [23]). These systemsare large in scale and generally implemented at anenterprise level; they are not readily accessible toresearchers/analysts. Research tools based purely onprogramming (e.g., WinBUGS; MatLab) were alsoexcluded. Details of the software packages included inthe review are outlined in Table 1.

Reviewing FrameworkSoftware programs were reviewed for broad steps oftypical data analysis: preprocessing, analysis (methods

Robertson and Nelson International Journal of Health Geographics 2010, 9:16http://www.ij-healthgeographics.com/content/9/1/16

Page 2 of 8

Page 3: Review of software for space-time disease surveillance

and technical issues), and output (Table 2). Preproces-sing is required to transform data into the appropriatestructure for a particular software package. In many sce-narios, health event data are collected at an addresslevel, which needs to be compared to population esti-mates, available usually as polygon census data [24].Our assessment of data preprocessing requirementsreflected typical data by considering both point eventcase data, and polygonal administrative units. For eachsoftware package, we assessed the data formatting stepsrequired to perform an analysis.The second step is conducting the analysis and we

briefly describe methods and analysis options for eachsoftware package. We highlight technical issues andpotential problems or requirements such as stability,speed of computation, and required operating systems.The final step is outputting results and we overview out-put options available in each package. In addition, wequalitatively assess user facility based on our experienceoperating the software with test datasets. It should benoted that we do not discuss parameterization of differ-ent methods. This is a major issue in practical surveil-lance, suited to a review and comparison of surveillancemethods themselves.

DatasetsData were simulated to model a syndromic surveillancesystem monitoring calls to a health hotline in theGreater Vancouver Area. For simplicity, we refer toeach simulated call as a case. Cases were simulated overone year from January 1st to December 31st. Cases wereaggregated to census dissemination areas (DA) and werespatially allocated proportional to the population in eachcensus DA. The total population in all DAs was 578,642,and total cases were 4303, giving an annual incidence of743.64 cases per 100,000. This level of incidence is simi-lar to what might be expected for the total volume ofcalls made to a telephone health hotline in a majorCanadian city [25].Outbreaks were inserted into baseline data to indicate

signals of a spike in calls which, in a sydnromic surveil-lance setting, indicate a signal of an unusual healthevent. Two outbreak scenarios were simulated in sepa-rate datasets. In outbreak one, a simulated outbreakstarted on March 4th and lasted until June 5th, with 148cases occurring over 10 sq km, covering 33 geographi-cally adjacent census DAs (light grey cluster, Figure 1).Outbreak cases were allocated proportional to censusDA population. In outbreak two, 6 spatial clusters con-stituting a total 501 cases occurred over an area of 16km2, covering a total of 104 census DAs (dark grey clus-ter, Figure 1). The number of cases in clusters rangedfrom 51 to 140, and cases occurred over the full year.Data were stored in Environmental Systems ResearchInstitute (ESRI) shapefile format, a standard spatial dataformat which can represent data as points, polygons, orlines.

Review of ProgramsData preprocessingThe steps involved in preprocessing the test data for ana-lysis in each software program are outlined in Table 3.SaTScan requires data to be input as three separate filesto run the appropriate analysis for this data (retrospectivespace-time scan, Poisson model) where one file stores thespatial locations (geo file), another file stores the cases(case file), and a third stores the population of each area

Table 1 List of software packages for review of space-time disease surveillance software

SoftwarePackage

Source Reference Description

SaTScan 8.0 http://www.satscan.org Kulldorff and InformationManagement Services 2009 [38]

Cluster detection software with several spatial, temporaland space-time scan statistics.

ClusterSeer 2.3 http://www.terraseer.com/ Jacquez et al. 2002 [39] Cluster analysis software includes many methods forspatial, temporal, and space-time analysis.

GeoSurveillance1.1

http://www.acsu.buffalo.edu/~rogerson/geosurv.htm

Yamada et al. 2009 [40] Implementation of cumulative sum surveillancestatistics.

Surveillancepackage 1.1-2

http://cran.r-project.org/web/packages/surveillance/index.html

Höhle 2007 [41] Package for statistical surveillance includes test-basedand model-based methods.

Table 2 Criteria and review approach for review of space-time disease surveillance software

Criteria Review

Datapreprocessing

Number of steps involved to process a point event(cases) shapefile and a polygon census shapefile(population)

Methods Description of methods offered by each program

Technicalissues

Speed of computation, system stability, automation,operating requirements

Analysisoutput

Output options (graphs, maps, reporting)

User facility Qualitative assessment rated on scale of 1 - 5 on eachof:

• Ease of learning

• Use

• Set up

• Documentation/Help

Robertson and Nelson International Journal of Health Geographics 2010, 9:16http://www.ij-healthgeographics.com/content/9/1/16

Page 3 of 8

Page 4: Review of software for space-time disease surveillance

Figure 1 Outbreaks simulated to review software packages for space-time disease surveillance (Outbreak one - light grey; Outbreaktwo - dark grey). Outbreak one consisted of one large compact cluster. Outbreak two was composed of several clusters occurring at differenttimes throughout the region.

Table 3 Data preprocessing steps for each software package to perform a space-time analysis starting with daily dataas point events in an ESRI point shapefile and a polygon shapefile of census dissemination area boundaries

Software Type of Analysis Required Data Structure Data Preprocessing Steps

SaTScan Space-time cluster scanwith Poisson model

• Case file with number of cases, date, and DA id• Population file with population, date, and DA id• Coordinates file with DA id, centroid X and Y coordinates

• Associate DA identifier with eachpoint event• Calculate DA centroid coordinates

ClusterSeer Space-time cluster scanwith Poisson model

• One table with population• One table with counts of cases for each location anddate during study period

• Associate DA identifier with eachpoint event• Calculate week numbers• Aggregate cases by week for eachDA (zero counts included)

GeoSurveillance Univariate cusum onindividual DAs

• DA shapefile with counts of number of cases for eachtime period named and ordered sequentially in the table

• Calculate week numbers• Split point events into uniqueshapefiles for each week• Count number of events in eachDA by week (zero counts included)• Calculate weekly counts as newfields

R-Surveillance Univariate cusum onindividual DAs

• Matrix of counts of cases with spatial locations ascolumns and time periods as rows

• Calculate week numbers• Split point events into uniqueshapefiles for each week• Count number of events in eachDA by week (zero counts included)• Calculate weekly counts as newfields• Read table into R as matrix andtranspose

Robertson and Nelson International Journal of Health Geographics 2010, 9:16http://www.ij-healthgeographics.com/content/9/1/16

Page 4 of 8

Page 5: Review of software for space-time disease surveillance

(population file). All SaTScan files are text-based, and animport tool is provided for importing common data for-mats (e.g., CSV, DBF). SaTScan also provides the func-tionality to aggregate the data temporally into years,months, or days. Thus, data can be input at the finesttemporal resolution. This functionality turned out to be akey advantage over other programs as it limited theamount of data restructuring required when trying differ-ent analysis parameters.ClusterSeer requires unique records for every space-

time unit under surveillance. Running a daily space-timescan statistic for our simulated data would require adataset with four columns (location, date, cases, popula-tion) and 478,515 records (365 days × 1311 censusDAs). Additionally, all areas need a record for everytime period. Generating the necessary table required useof specialized data restructuring functions in R statisticalsoftware (reshape package). Data were aggregated tocounts of cases by week. (52 weeks × 1311 census DAs)giving a table with 68,172 records. For weeks whereDAs had no cases, zero counts had to be inserted.Preparing data for analysis in GeoSurveillance required

aggregation temporally and spatially. Counts of caseswere required to be attributes of the polygon shapefile(or text file), and fields were required to be named insequential order. This process was automated by customprogramming in ArcGIS which performed spatial joinsand added new fields to the attribute table. This was anextensive process to get the data in the proper formatfor analysis, and similar to ClusterSeer, GeoSurveillancedoes not allow flexibility in the level of temporal aggre-gation. ClusterSeer and GeoSurveillance can both readin polygon shapefiles and automatically calculate cen-troid coordinates.For analysis with the Surveillance package in R, data

were required to be in a matrix with temporal observa-tions as rows and spatial units as columns, giving a 365 ×1311 matrix for daily analysis and 52 × 1311 for weeklyanalysis. All of the programs except SaTScan had inflexibledata input requirements, specifically for temporal aggrega-tion of cases. None of the software programs could inputthe two shapefiles (points and polygons) without any datapreprocessing. This was surprising as previous experienceand a review of SaTScan [26] suggested cumbersomeinput format as a major limitation of SaTScan.MethodsThe programs reviewed here are of two types: specia-lized implementation of a specific class of surveillancealgorithms (SaTScan, GeoSurveillance) and full suitesurveillance/space-time analysis packages that imple-ment multiple methods (ClusterSeer, R-surveillance).SaTScan offers a number of scan statistics such as spa-tial [27], temporal [28], and space-time versions [14,15],as well as retrospective and prospective (clusters must

be current) modes. Different data types can be accom-modated by the many probability models including Pois-son, Bernoulli, space-time permutation, multinomial,ordinal, exponential, and normal. The circular searcharea used in the classical scan statistic can also bealtered to search using an ellipse, or along user-definedconnections of spatial units. GeoSurveillance imple-ments the cusum approach to surveillance [e.g., [13]].The retrospective mode does global spatial analysis only(i.e., reports one cusum test statistic for the map), whilethe prospective mode does univariate parallel surveil-lance with the cusum statistic. The multivariate cusumis not yet implemented in GeoSurveillance.ClusterSeer had the widest range of space-time meth-

ods implemented. Those particularly suited to diseasesurveillance included space-time scanning [14], a cusumapproach similar to that in GeoSurveillance [13], andtests for space-time interaction [29-31]. This makesClusterSeer a useful tool for exploring disease surveil-lance data. Once data is formatted for use in Cluster-Seer, a variety of methods can be used to examine thedata. The R-Surveillance package contains a number ofalgorithms such as the Farrington et al. (1996) method[32], Poisson cusum [33], and the two-component nega-tive binomial model in Held et al. [34]. The algorithmsin the surveillance package are mostly model-based andnon-spatial, though some space-time surveillance appli-cations can be treated as a multivariate time seriesproblem.Technical IssuesTechnical issues encountered in running the softwareprograms varied considerably. SaTScan was capable ofrunning the space-time scan statistic in retrospectivemode on daily case data. ClusterSeer was not run ondaily data. Initially, memory requirements were a seriouslimitation of undertaking analysis in ClusterSeer withboth test datasets; however an updated version (2.3.22.0)was obtained to complete the analysis on weekly data.The analysis took longer to run than on SaTScan withdaily data, though results were very similar. GeoSurveil-lance ran the univariate cusum in parallel on each ofthe 1311 census DAs. The analysis ran well on weeklydata, however the linked display between the maximumcusum and the map was very slow. The cusum methodswere also used for our analysis in R-Surveillance. Thetime taken to run the analysis on the weekly data wassimilar to that of GeoSurveillance and results were alsosimilar.R-Surveillance is the only package that runs on win-

dows, mac and linux operating systems. Currently,SaTScan has versions for windows and linux, and a macversion is in development. Both ClusterSeer and Geo-Surveillance run only on the windows operating system.All analyses were run on a Pentium 4 PC with 3.00

Robertson and Nelson International Journal of Health Geographics 2010, 9:16http://www.ij-healthgeographics.com/content/9/1/16

Page 5 of 8

Page 6: Review of software for space-time disease surveillance

GHz processor and 2 GB of RAM running the WindowsXP operating system. SaTScan completed analysis in theshortest time compared to all other programs.Data OutputOutput options in SaTScan are limited to text file anddatabase file output. Database files can be linked back tothe input shapefile in a GIS for further examination ofclusters, however no data exploration functionality isavailable in SaTScan itself. In GeoSurveillance results ofan analysis can be written to text file which can beeasily manipulated in other software. GeoSurveillanceprovides a basic map interface linked to a list of cusumscores. A cusum chart is also displayed showing thetemporal pattern of cusum scores for the study area as awhole and individual units.ClusterSeer has advanced data output facility such as

mapping and graphing which can be exported as images.Results can also be exported with the data to new filesfor further examination inside statistical or GIS soft-ware. The Surveillance package has access to extensivevisualization and exporting functions available in the Renvironment. The objects specific to the Surveillancepackage also have default methods for creating plots.This of course requires familiarity with the R program-ming language.User Facility: Ease of Learning, Ease of Use, Help &DocumentationUsability is an important part of software as publichealth organizations have limited resources available fortechnical training. Our review of user facility is pre-sented in Table 4. ClusterSeer includes an extensivehelp menu explaining the parameters and required datafor all of the methods. The help system also includestutorials and example datasets that work through manyof the methods. This is an important resource for learn-ing methods of spatial and space-time analysis. The gra-phical user interface (GUI) of ClusterSeer makeslearning and use straightforward. SaTScan is also aGUI-based system composed of three main screens:input, analysis, and output. The help menu in SaTScanis extensive with descriptions of the scan statistic meth-odology, explanations of parameters and data input and

output options, sample datasets, and references forfurther reading. GeoSurveillance has two basic modeswhich are run from menus of a simple GUI. The pro-gram is easy to use after data has been formatted prop-erly (as described above). Currently there is no helpbuilt into the system itself. The menus are described ina separate word document. A tutorial and sample data-sets are also provided. Having these outside of the pro-gram itself makes navigating the documentationcumbersome. R-Surveillance is an R package and assuch has help in the R package format, which can becalled directly from R. This includes descriptions ofparameters and values for all of the implemented func-tions in the package. Basic examples are given, althoughdetailed descriptions of the statistical methods is lacking.Users should be familiar with using R packages and thebackground statistical methodology before using the sur-veillance package.

ConclusionsWith the advent of electronic medical records, syndro-mic data sources, and low-cost location sensors, diseasedata are increasingly encoded with both spatial and tem-poral information. These new data sources represent anopportunity for greater understanding of disease distri-butions, risk factors, and changes to population healthover time and space. While analysis of surveillance datarepresents an expanding opportunity for public healthpractice and research, these new datasets, methods, andsoftware also bring challenges. There are inherent pro-blems in using traditional statistics for hypothesis test-ing, or applying simple GIS visualization, to these datasources. As is evidenced by the growing literature onstatistical surveillance of disease data [9], methods needto be specifically suited to these data. In addition to sta-tistical methods however, computer software is nowessential for the analysis of surveillance data.The four software programs reviewed in this paper

provide functionality for different kinds of analysis andserve different purposes. Based on our review, SaTScanis the most developed and robust software package forimplemention in an automated cluster detection system.

Table 4 Comparative review of software packages for space-time disease surveillance: User Facility

Software Learning Use SetUp

Help/Documentation

Comments

SaTScan 4 5 5 4 Requires knowledge of scan statistics. Basic analysis is straightforward though manyadvanced options available. Well referenced methodology in the user guide.

ClusterSeer 5 5 3 5 Excellent documentation and learning resources for the many different methods. Dataformat requirements can be cumbersome.

GeoSurveillance 3 3 3 3 Data structure is peculiar, though the basic user interface is straightforward.Documentation not integrated within the menu itself.

R - Surveillance 1 3 5 2 Command driven system requires knowledge of R language. Examples are easy toreplicate. Very easy to install within R. Documentation is not extensive.

Robertson and Nelson International Journal of Health Geographics 2010, 9:16http://www.ij-healthgeographics.com/content/9/1/16

Page 6 of 8

Page 7: Review of software for space-time disease surveillance

However, SaTScan only implements scan statistic meth-ods, so those wishing to explore modeling-basedapproaches may want to use the Surveillance package.Additionally, examining the results in detail requiresother software for graphing and mapping. Reasons fortaking a modeling approach include making refined esti-mates of expected rates based on modeled covariateeffects, adjusting for spatial heterogeneity in disease rate,and smoothing relative risks. The Surveillance packageimplements models, but currently has very limited cap-ability for true space-time surveillance. The large num-ber of temporal methods make it a useful environmentfor exploring surveillance data, in addition to the advan-tages afforded by being able to integrate with other Rpackages. As a command-based system, it also is easy toautomate and integrate with data processing scripts. Thelearning curve for R is quite steep, and those requiring aGUI-based system to explore surveillance data would bebetter served by ClusterSeer. The extensive documenta-tion and many purely spatial and temporal methods, inaddition to space-time methods, makes it a convenienttool for initial data exploration. There is also a range ofoutput options in ClusterSeer. ClusterSeer may be moresuitable for exploratory studies than as part of anongoing, automated cluster detection system becausethere is limited capacity for automated surveillance.ClusterSeer project files can be set to run automatically,though because they are binary files they cannot beautomatically configured to increment parameters (e.g.,study period). Finally, though methods (and software)have been classified as testing or model-basedapproaches, it is important to note that theseapproaches are complimentary rather than opposing[18]. For example, one approach is to develop a modelof the expected risk of disease using the Surveilllancepackage, and use the estimated smoothed rates as theexpected values in a SaTScan analysis.All of the programs reviewed in this paper were appli-

cations installed on a local computer. While this is thearchitecture of most computer software applications,new developments in computing are taking advantage ofthe internet to perform ongoing, high-powered comput-ing tasks [35]. Online delivery of analytic services (suchas cluster analysis) allows software to be centralized onone server, and accessible from anywhere with an inter-net connection. In the context of disease surveillance,this could facilitate standardization of analysis amongdifferent regional health authorities, increase transpar-ency of analysis, and offer significant improvements incosts and performance. Initial steps towards web-basedsurveillance analysis are underway, with a web-basedversion of ClusterSeer https://www.clusterseer.com cur-rently in development, RWeb [36], a web-based interfaceto a server instance of R, as well as a newer project

called rapache [37], which integrates R into the popularApache web server. These developments hold consider-able promise for the development of future surveillancesystems.The threat of emerging diseases and the growing bur-

den of chronic diseases requires integrated approachesto surveillance. Analysis of disease trends in space-timeprovides context which can be linked to possible riskfactors in a research environment, flag unusual events inan automated surveillance system, and provide epide-miologists with current information during an outbreak.Well-studied and understood methods are required toensure appropriate use and transparent and reproducibleresults. The literature on statistical surveillance is exten-sive and provides this basis, yet software implementa-tions are far from standardized. As space-timesurveillance statistical methods mature further, softwareis also surely to improve. The open-source environ-ments, such as R, may be the optimal venue for futuredevelopment of surveillance software as they afford easyintegration with many statistical and mapping packages,and being open-source, the underlying code can beviewed and modified easily. However data structureremains a major issue when handling space-time data,especially when data has to be moved between differentsoftware packages. Standardized space-time data classesin R or another open-source environment may be afruitful area of development.

AcknowledgementsThis project was supported in part by the Teasdale-Corti Global HealthResearch Partnership Program, National Sciences and Engineering ResearchCouncil of Canada, and GeoConnections Canada.

Authors’ contributionsCR and TN conceived of the review, and participated in its design. CRperformed all software analysis and data processing. All authors read andapproved the final manuscript.

Competing interestsThe authors declare that they have no competing interests.

Received: 22 December 2009 Accepted: 12 March 2010Published: 12 March 2010

References1. Ekpo UF, Mafiana CF, Adeofun CO, Solarin AR, Idowu AB: Geographical

information system and predictive risk maps of urinary schistosomiasisin Ogun State, Nigeria. BMC Infect Dis 2008, 874.

2. Kim A, Martinez A, Klausner J, Goldenson J, Kent C, Liska S, McFarland W:Use of sentinel surveillance and geographic information systems tomonitor trends in HIV prevalence, incidence, and related risk behavioramong women undergoing syphilis screening in a jail setting. J UrbanHealth 2008, 86(1):79-92.

3. Aylin P, Maheswaran R, Wakefield J, Cockings S, Jarup L, Arnold R,Wheeler G, Elliott P: A national facility for small area disease mappingand rapid initial assessment of apparent disease clusters around a pointsource: the UK Small Area Health Statistics Unit. J Public Health 1999,21(3):289-298.

Robertson and Nelson International Journal of Health Geographics 2010, 9:16http://www.ij-healthgeographics.com/content/9/1/16

Page 7 of 8

Page 8: Review of software for space-time disease surveillance

4. Richards TB, Croner CM, Rushton G, Brown CK, Fowler L: Geographicinformation systems and public health: mapping the future. Public HealthRep 1999, 114(4):359-360.

5. Rushton G: Public health, GIS, and spatial analytic tools. Annu Rev PublicHealth 2003, 2443-56.

6. Langran G: Time in Geographic Information Systems. New York: Taylor &Francis 1992.

7. Population Data BC: Setting a health geomatics education and trainingagenda. Victoria 2008.

8. Sonesson C, Bock D: A review and discussion of prospective statisticalsurveillance in public health. J R Stat Soc Ser A 2003, 166(1):5-21.

9. Lawson AB, Kleinman K: Spatial and Syndromic Surveillance for PublicHealth. West Sussex: John Wiley 2005.

10. Wagner MM, Moore AW, Aryel RM: Handbook of Biosurveillance. London:Elsevier 2006.

11. Robertson C, Nelson TA, MacNab YC, Lawson AB: Review of methods forspace-time disease surveillance. Spat Spattemporal Epidemiol .

12. Rogerson PA, Yamada I: Monitoring change in spatial patterns of disease:comparing univariate and multivariate cumulative sum approaches. StatMed 2004, 23(14):2195-2214.

13. Rogerson P: Surveillance systems for monitoring the development ofspatial patterns. Stat Med 1997, 16(18):2081-2093.

14. Kulldorff M: Prospective time periodic geographical disease surveillanceusing a scan statistic. J R Stat Soc Ser A 2001, 164(1):61-72.

15. Kulldorff M, Heffernan R, Hartman J, Assuncao R, Mostashari F: A space-time permutation scan statistic for disease outbreak detection. PLoS Med2005, 2(3):e59.

16. Vidal Rodeiro CL, Lawson AB: Monitoring changes in spatio-temporalmaps of disease. Biom J 2006, 48(3):463-480.

17. Kleinman K, Lazarus R, Platt R: A generalized linear mixed modelsapproach for detecting incident clusters of disease in small areas, withan application to biological terrorism. Am J Epidemiol 2004,159(3):217-224.

18. Kleinman KP, Abrams AM, Kulldorff M, Platt R: A model-adjusted space-time scan statistic with an application to syndromic surveillance.Epidemiol Infect 2005, 133(03):409-419.

19. Reis BY, Kohane IS, Mandl KD: An epidemiological network model fordisease outbreak detection. PLoS Med 2007, 4(6):e210.

20. Kim Y, O’Kelly M: A bootstrap based space–time surveillance model withan application to crime occurrences. J Geograph Systems 2008,11(2):155-173.

21. Watkins R, Eagleson S, Veenendaal B, Wright G, Plant A: Diseasesurveillance using a hidden Markov model. BMC Med Inform Decis Mak2009, 9(1):39.

22. Tsui FC, Espino JU, Dato VM, Gesteland PH, Hutman J, Wagner MM:Technical description of RODS: a real-time public health surveillancesystem. J Am Med Inform Assoc 2003, 10(5):399-408.

23. Reis BY, Kirby C, Hadden LE, Olson K, McMurry AJ, Daniel JB, Mandl KD:AEGIS: A Robust and scalable real-time public health surveillancesystem. J Am Med Inform Assoc 2007, 14(5):581-588.

24. Gotway C, Young L: Combining incompatible spatial data. J Am Stat Assoc2002, 97:632-648.

25. Perry A: Forecasting hospital emergency department visits for respiratoryillness using ontario’s telehealth system: an application of real-timesyndromic surveillance to forecasting health services demand. MSc ThesisQueen’s University, Department of Community Health and Epidemiology2009.

26. Block R: Scanning for clusters in space and time. Soc Sci Comput Rev2007, 25:272-278.

27. Kulldorff M, Nagarwalla N: Spatial disease clusters: detection andinference. Stat Med 1995, 14:799-810.

28. Nagarwalla N: A scan statistic with a variable window. Stat Med 1996,15(79):845-850.

29. Knox E: The detection of space-time interactions. Appl Stat 1964, 13:25-29.30. Mantel N: The detection of disease clustering and a generalized

regression approach. Cancer Res 1967, 27:209-220.31. Jacquez G: A k nearest neighbour test for space-time interaction. Stat

Med 1996, 15:1935-1949.32. Farrington CP, Andrews N, Beale AD, Catchpole MA: A statistical algorithm

for the early detection of outbreaks of infectious disease. J R Stat Soc SerA 1996, 159(3):547-563.

33. Rossi G, Lampugnani L, Marchi M: An approximate CUSUM procedure forsurveillance of health events. Stat Med 1999, 18(16):2111-2122.

34. Held L, Hofmann M, Hohle M, Schmid V: A two-component model forcounts of infectious diseases. Biostatistics 2006, 7(3):422-437.

35. Armbrust M, Fox A, Griffith R, Joseph AD, Katz RH, Konwinski A, Lee G,Patterson DA, Rabkin A, Stoica I: Above the clouds: A berkeley view ofcloud computing. EECS Department, University of California, Berkeley, Tech.Rep. UCB/EECS-2009-28 2009.

36. RWeb. [http://www.math.montana.edu/Rweb].37. Horner J: rapache: Web application development with R and Apache.

2009 [http://biostat.mc.vanderbilt.edu/rapache/].38. Kulldorff M, Information Management Services Inc: SaTScan™ v8.0:

Software for the spatial and space-time scan statistics. 2009 [http://www.satscan.org].

39. Jacquez GM, Greiling DA, Durbeck H, Estberg L, Do E, Long A, Rommel B:ClusterSeer User Guide 2: Software for identifying disease clusters. AnnArbor, MI: TerraSeer Press 2002.

40. Yamada I, Rogerson P, Lee G: GeoSurveillance: a GIS-based system for thedetection and monitoring of spatial clusters. J Geograph Systems 2009,11(2):155-173.

41. Höhle M: Surveillance: An R package for the monitoring of infectiousdiseases. Comput Stat 2007, 22(4):571-582.

doi:10.1186/1476-072X-9-16Cite this article as: Robertson and Nelson: Review of software for space-time disease surveillance. International Journal of Health Geographics 20109:16.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Robertson and Nelson International Journal of Health Geographics 2010, 9:16http://www.ij-healthgeographics.com/content/9/1/16

Page 8 of 8