Combining data from multiple monitors in air pollution mortality time series studies

Atmospheric Environment 37 (2003) 3317–3322

Combining data from multiple monitors in air pollutionmortality time series studies

Steven Roberts*

Department of Statistics, Stanford University, Sequoia Hall, 390 Serra Mall, Stanford, CA 94305-4065, USA

Received 29 October 2002; accepted 30 March 2003

Abstract

In community time series studies on the effect of particulate air pollution on mortality, particulate air pollution data

are often available from multiple monitors. Published studies have combined the data from the multiple monitors using

a simple or trimmed average. In this paper, I describe an alternative method for combining the particulate air pollution

data available from multiple monitors. Using time series data from Cook County, Illinois the proposed method yields

more meaningful biologically plausible particulate air pollution mortality effect estimates without loss of statistical

accuracy and reduces the full set of monitors to a smaller set without loss of information. The proposed monitor

combination may be more highly correlated with the average population exposure to particulate air pollution than a

simple or trimmed average of the monitors.

r 2003 Elsevier Ltd. All rights reserved.

Keywords: PM10; Air pollution; Time series; Monitoring; Mortality

1. Introduction

There have been numerous community time series

studies on the effect of air pollution on mortality

(Dominici et al., 2000, 2002a; Ito et al., 1995; Samet

et al., 2000a; Schwartz and Zanobetti, 2000; Smith et al.,

1999). These studies typically fit a generalized additive

model (GAM) (Hastie and Tibshirani, 1990) or general-

ized linear model (GLM) (McCullagh and Nelder, 1989)

to time series of daily mortality, air pollution and

meteorological covariates. The fitted models are then

used to quantify the effect an increase in air pollution

has on mortality.

Many of these community air pollution mortality time

series studies have been carried out in regions that have

multiple air pollution monitoring sites. These studies

typically combine the multiple monitors into a single air

pollution time series using a simple or trimmed average.

In this paper, I investigate an alternative method of

combining the data from the multiple air pollution

monitoring sites. Each site is allowed its own co-efficient

for air pollution mortality. Then I maximize a con-

strained likelihood for the observed mortality time

series. The resulting monitor combination will be shown

to have advantages over using a simple or trimmed

average of the monitors.

2. Data

The data used to illustrate the methods of this paper

are PM10 time series from Cook County, Illinois for the

period 1987–1994. The PM10 time series measures the

ambient 24-h concentration of particulate matter of less

than 10 mm in diameter. These are the same Cook

County data used in Samet et al. (2000a).

The mortality time series are non-accidental daily

deaths of individuals aged 65 and over. I also use time

series of the daily 24-h mean temperature and mean dew

point temperature. The temperature and dew point data

are based on hourly observations taken at O’Hare

ARTICLE IN PRESS

*Fax: +1-650-725-8977.

E-mail address: [email protected] (S. Roberts).

1352-2310/03/$ - see front matter r 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S1352-2310(03)00289-9

International Airport. For more information on how the

various time series were constructed refer to Samet et al.

(2000a).

Pollution data were obtained from the United States

Environmental Protection Agency’s Aerometric Infor-

mation Retrieval System (AIRS). In Cook County, there

were six PM10 monitoring sites in operation for the full

period 1987–1994. Fig. 1 shows the location of the six

monitors. The distance between monitors 1 and 4 is

17:7 Km: Four of the six monitors had PM10 readings

every six days, taken on the same day. The other two

monitors had daily PM10 data. I extracted the relevant

every-six-day readings from the two daily recording

monitors to harmonize these daily time series with the

other four series. In the recent 90 cities’ study (Samet

et al., 2000b) most cities had PM10 monitors in

operation on the same six day collection schedule.

Table 1 contains a summary of the data available, within

the every-six-day collection schedule, for each of the six

monitors, as well as qualitative information about the

location of the monitors. Remaining missing values in

the six day collection schedule, for a given monitor, were

imputed using linear regression with the other monitors

as predictors. (In all, I used six monitoring sites each

with 487 PM10 values during the study period.)

Each of the six PM10 time series was standardized to

have zero mean and unit variance. This was done to

avoid problems that may arise due to differences in the

scale of PM10 variation among the six PM10 monitors. If

the PM10 time series were not standardized then

doubling the PM10 readings from one of the monitors

would halve the PM10 mortality effect estimate for that

monitor. This in-turn would make this PM10 monitor

seem less important in describing the PM10 mortality

relationship. Standardizing the PM10 time series re-

moves this problem of interpretation.

3. Methodology

In many recent community time series studies on the

effect of PM10 on mortality, an additive model of the

following form is fit to the time series of observed

mortality:

f ðmtÞ ¼ mþ confounderst þ b PMt; ð1Þ

where, t is the observation day, mt the mean number of

deaths on day t; m the intercept term, f a link function

(typically f will either be the square-root, log or identity

function), confounderst the other time-varying variables

which are related to daily mortality (typical confounders

include: temperature, humidity, longer term mortality

trends and seasonality), PMt the PM10 concentration

ARTICLE IN PRESS

5

3

2

1

4 Chicago

*MichiganLake

Arlington Heights*

6

Fig. 1. Map of Cook County, Illinois showing the location of

the six PM10 monitors. The distance between monitors 1 and 4

is 17:7 Km:

Table 1

Summary of data available from the Cook County, Illinois PM10 monitors, within the six day collection schedule, for the period 1987–

1994, inclusive

Monitor Years Days

of Data

%Missing

Land

use

Location

setting

1 87–94 463 5 CO SU

2 87–94 440 10 IN SU

3 87–94 455 7 CO CI

4 87–94 417 14 CO SU

5 87–94 469 4 IN SU

6 87–94 457 6 RE SU

Monitor gives the monitor number. Years gives the years the monitor was in operation. Days of Data gives the number of days the

monitor collected a PM10 concentration. % Missing gives the percentage of days the monitor was missing a PM10 concentration. Land

use is the prevalent land use within a 14mile of the monitor. Key: CO, commercial; IN, industry; RE, residential. Location setting is the

type of environment in which the monitor is located. Key: CI, city; SU, suburban.

S. Roberts / Atmospheric Environment 37 (2003) 3317–33223318

time series, b the effect of PM10 on mortality (it gives the

increase in f ðmtÞ per unit increase in PM10).

In Dominici et al. (2000, 2002a), Samet et al. (2000a)

the link f is taken to be the log function, the confounder

adjustments use smooth non-parametric functions and

indicator variables, and PMt is taken to be either the

current day or a previous day PM10 concentration.

Davis et al. (1996) take the link f to be the square root

function, the confounders adjusted parametrically and

PMt is the average of the current and previous two days’

PM10 concentrations. The b value from these studies was

used to quantify the effect of PM10 on mortality.

When there are multiple PM10 monitors the data from

these monitors are typically combined into one PM10

time series. Dominici et al. (2000) and Samet et al.

(2000a) first adjust the values at each monitor by its

respective yearly average. Then for each day a 10%

trimmed average of the adjusted series is used to

represent the area-wide PM10 for that day. Davis et al.

(1996) combine the data from the multiple monitors

using a simple average. The resulting single PM10 time

series is then used in model (1).

In an air pollution mortality time series study with k

PM10 monitors, model (1) may be extended

f ðmtÞ ¼ mþ confounderst þ b1 PMt1 þ b2 PMt2

þ ?þ bk PMtk; ð2Þ

where PMts represents the standardized PM10 time series

for monitor s:The PM10 component from model (2) can be re-

written in the following form:

Xk

s¼1

bsPMts ¼ bXk

s¼1

asPMts

!; ð3Þ

where b ¼Pk

s¼1 bs andPk

s¼1 as ¼ 1:In this form, we can see that the PM10 coefficients can

be written as a weighted average with weights, as; thatsum to one, and b is the increase in the mortality index

f ðmtÞ if the PM10 level at each monitor is one standard

deviation above its mean level. Studies that use an

average of the available PM10 monitors assume that the

as values are all equal.

I further constrain the bs to be non-negative. This

constraint implies that increases in PM10 cannot

decrease the expected mortality rate, i.e. the bs are

constrained to be biologically plausible. Model (2) can

also be fit without placing constraints on the bs: It willbe shown that the increase in bias, by constraining the bs

to be non-negative, is offset by the decrease in variance.

Constraining the bs has other desirable properties that

will be discussed later.

Once the form of model (2) is chosen, the model

parameters are fit using a constrained maximum like-

lihood. I obtain the constrained maximum likelihood

estimates iteratively in two repeated steps. The first step

fixes the parameters corresponding to the PM10 terms,

the bs; at their current values and estimates the

parameters corresponding to the confounders. This can

be done using GLM software. The second step fixes the

parameters corresponding to the confounders at their

current values and estimates the parameters correspond-

ing to the PM10 terms. This can be done using

constrained optimization software. These two steps are

iterated to obtain the final estimates of bs: The iterativemethod is summarized as follows:

1. Choose starting values for bs:2. Fix bs: Estimate the confounder parameters.3. Fix the confounder parameters. Re-estimate bs:4. Fix bs: Re-estimate the confounder parameters.5. Repeat steps 3 and 4 until the PM10 effect estimates,

the bs; converge.

At each step, the parameters are being estimated using

maximum likelihood. The constrained likelihood is

bounded and the steps in the iterative estimation

procedure cannot decrease the constrained likelihood.

Thus the iterative estimation procedure will converge.

This method will be referred to as the constrained

multiple monitor method (constrained MMM). The

final bs values can then be used to form b and as; asdefined in Eq. (3). The as can be used to identify a subset

of the monitors that capture as much PM10 mortality

variation as all the monitors combined. If the as values

for some monitors are zero, or near-zero, then con-

strained MMM is re-fit using a reduced set of monitors.

Constrained MMM can be extended to include co-

pollutants by including co-pollutants in model (2) in the

same way as PM10:

4. Simulations

I use simulations to compare constrained MMM to

using a simple average of the monitors (MAS) or an

unconstrained MMM (MMM fit without constraining

the bs to be non-negative).

In the simulations, it is assumed there is an underlying

unobserved PM10 time series that is related to daily

mortality. The PM10 monitors are more or less

correlated with the underlying PM10 time series. The

goal is to estimate, using the PM10 time series from the

monitoring sites, the effect of the underlying PM10 time

series on mortality. For the simulation study, I took the

underlying PM10 time series to be the Cook County

monitor 1 time series denoted by PMut : The remaining

five Cook County monitors are then used to estimate the

PM10 mortality effect, via constrained MMM, MAS or

unconstrained MMM.

Using PMut ; the log mean of the simulated Poisson

mortality time series is given by logðmtÞ ¼ logð83Þ þ

ARTICLE IN PRESSS. Roberts / Atmospheric Environment 37 (2003) 3317–3322 3319

fPMut : The average daily mortality in Cook County was

83. f is the mortality effect of the underlying PM10 time

series. The f values used in the simulations span the

plausible range for the PM10 mortality effect. For

simplicity, confounders were not included in the

simulations.

For each value of f; I generated 200 mortality time

series of length 487 days, to match the length of the

Cook County data set. Constrained MMM, MAS and

unconstrained MMMwere then applied to the generated

mortality time series to estimate f: For each value of f;Table 2 shows the root mean squared error (RMSE)

over the 200 simulations. In each case, the RMSE for

constrained MMM is similar to the RMSE for MAS or

unconstrained MMM. The larger bias of the constrained

MMM estimator is compensated for by a decrease in

estimation variance. (These simulations were repeated

using each of the other five monitors as the underlying

PM10 time series. In each case, results similar to those

for monitor 1 were obtained.)

Another advantage of constrained MMM is that it

can assign zero weight to a monitor. This allows the full

set of monitors to be reduced to a smaller set, for these

purposes without loss of information. The last column

in Table 2, 0 Weight, contains the median number of

monitors that were given exactly zero weight by

constrained MMM.

5. Application to Cook County, Illinois

In this section, constrained MMM is applied to the

actual mortality time series and PM10 monitoring data

from Cook County, as described in Section 2. I fit a

Poisson log-linear model similar to those used in

Dominici et al. (2002b, 2003). The model I used was

logðmtÞ ¼ mþ St1ðtime; 8=yearÞ þ St2ðtem p0; 6Þ

þ St3ðtem p1�3; 6Þ þ St4ðdew0; 3Þ

þ St5ðdew1�3; 3Þ þ gDOWt þ b PMt; ð4Þ

where, t is the observation day, mt the mean number of

deaths on day t; m the intercept term, St1ðtime; 8=yearÞ asmooth function of time with eight degrees of freedom

per year (the smooth function is represented using

natural cubic splines), St2ðtem p0; 6Þ and St3ðtem p1�3; 6Þsmooth functions with a total of six degrees of freedom,

tem p0 the current day’s mean 24 h temperature and

tem p1�3 the average of the previous three days’ 24 h

mean temperatures (the smooth functions are repre-

sented using natural cubic splines), St4ðdew0; 3Þ and

St5ðdew1�3; 3Þ similar functions for the 24-h mean dew

point temperature, DOWt a set of indicator variables for

the day of the week, g a vector of co-efficients that

contains the mortality adjustments for the day of the

week.

The PM10 time series used here is the current day

PM10 level. An exploratory analysis suggested that the

current day PM10 gives more significant results than

using either the day before or two days before PM10

level. I do not account for this selection effect.

Table 3 contains the results of b estimation by several

methods:

1. Single PM10 monitors.

2. The simple average of the six PM10 monitors (MAS).

3. Constrained MMM.

4. Unconstrained MMM.

The first six rows correspond to the six monitors used

individually. For rows 1–6, Estimate is the estimated

percentage increase in mortality for a one standard

deviation increase in the PM10 level at that monitor. For

MAS, constrained MMM, and unconstrained MMM,

Estimate is the estimated percentage increase in

ARTICLE IN PRESS

Table 2

Results of simulations comparing constrained MMM (MMM),

simple average of the six monitors (MAS) and unconstrained

MMM (MMMuc), using monitor 1 as the underlying PM10 time

series

f Model Mean SD RMSE 0 Weight

0

MMMuc �0.0005 0.0059 0.0059

MMM 0.0036 0.0038 0.0052 4

MAS �0.0004 0.0058 0.0058

0.005

MMMuc 0.0043 0.0056 0.0057

MMM 0.0070 0.0044 0.0049 3

MAS 0.0043 0.0056 0.0056

0.01

MMMuc 0.0094 0.0054 0.0054

MMM 0.0115 0.0050 0.0052 3

MAS 0.0095 0.0054 0.0054

0.02

MMMuc 0.0188 0.0057 0.0058

MMM 0.0203 0.0053 0.0053 2

MAS 0.0190 0.0057 0.0057

0.03

MMMuc 0.0294 0.0051 0.0051

MMM 0.0304 0.0048 0.0048 2

MAS 0.0296 0.0051 0.0051

0.06

MMMuc 0.0587 0.0059 0.0060

MMM 0.0594 0.0058 0.0058 1

MAS 0.0592 0.0059 0.0059

f is the effect of the underlying PM10 time series on mortality.

Model is the model that is being fit to the simulated mortality

time series in order to estimate f:Mean is the mean of the 200 festimates. SD is the standard deviation of the 200 f estimates.

RMSE is the root mean squared error of the 200 f estimates. 0

Weight is the median number of monitors that received exactly

zero weight from constrained MMM.

S. Roberts / Atmospheric Environment 37 (2003) 3317–33223320

mortality if the PM10 level increases by one standard

deviation at each of the six monitors. The estimated

percentage increase in mortality is given by 100ðeb �1ÞE100b: SE is the standard error of the estimated

percentage increase in mortality. The standard error for

constrained MMM is calculated using a parametric

bootstrap. Using the parameter estimates from con-

strained MMM, 100 new mortality time series are

simulated, constrained MMM is then applied to the

simulated mortality time series to obtain a b estimate.

The standard error is calculated as the standard

deviation of the 100 constrained MMM b estimates.

The weights as assigned to the six monitors using

constrained MMM are shown in column 4. The zero

weight given to monitors 1,3,5 and 6 are the exact

weights, they were not near-zero weights that have been

rounded to zero.

The results for the six single monitor models show

that the location of the monitor can have an important

bearing on the PM10 mortality effect estimate. For

example, monitors 2 and 4 give PM10 mortality effect

estimates that are larger than the PM10 mortality effect

estimates from the other four monitors. This is similar to

what Ito et al. (1995) found. This raises an important

issue for air pollution mortality studies, particularly in

areas with only a small number of monitors. In such

regions, these results show that the estimate for the

effect of PM10 on mortality may depend on the monitor

location even after standardizing. The constrained

MMM weights suggest that monitors 2 and 4 will

capture as much of the PM10 mortality variation as all

six monitors. Fig. 1 shows the location of these two

monitors.

Using monitors 2 and 4 alone, model (4) was fit using

the standard GLM estimating procedure (GLMstd)

which does not constrain the bs: The estimated bs from

GLMstd are essentially the same as those obtained by

the constrained MMM iterative estimation procedure,

which constrains the bs: This provides a check that the

iterative constrained MMM estimation procedure is

working correctly.

6. Discussion and conclusion

In this paper, I explored issues related to multiple

PM10 monitors in community PM10 mortality time series

studies. I showed, using data from Cook County, how

the PM10 mortality effect estimate depends on the

location of the monitor. Next, I applied a method of

optimally combining the PM10 data available from

multiple monitors (constrained MMM), and compared

it to using an average of monitors (MAS) and

unconstrained MMM. Based on a simulation study,

constrained MMM mortality estimates appear to have

the following advantages:

1. More meaningful biologically plausible PM10 mor-

tality effect estimates without loss of statistical

accuracy.

2. The full set of monitors can be reduced to a smaller

set, for the purposes of estimating PM10 mortality

effects, without losing information.

If a monitor is given zero weight by constrained MMM

it does not mean this monitor is less important than

another monitor that is given non-zero weight for

describing the PM10 mortality relationship. For exam-

ple, consider a study with three PM10 monitors where

monitors 1 and 2 are highly correlated, and these two

monitors are more strongly associated with community

mortality than monitor 3. In this situation, constrained

MMM may give non-zero weight to monitors 1 and 3,

and zero weight to monitor 2. This does not mean

monitor 3 is more important than monitor 2. It only

means that in the presence of monitor 1, monitor 3

describes more of the remaining PM10 mortality varia-

tion than monitor 2.

Simulations showed that there was no loss in

statistical estimation precision that results from con-

straining MMM estimates to be biologically plausible—

biases in constrained MMM are compensated for by

correspondingly lower estimation variance.

In a community study of the effect of PM10 on

mortality, the relevant PM10 variable should be the

average population exposure to PM10; which is not

directly available. Instead, the ambient PM10 concentra-

tion is measured at multiple monitoring sites. Using the

PM10 data from the multiple monitoring sites, it is

ARTICLE IN PRESS

Table 3

Results of fitting the single monitor models, simple average of

the six monitors (MAS), unconstrained MMM (MMMuc) and

constrained MMM (MMM) to the Cook County, Illinois

mortality data

Monitor/Model Estimate SE Wgt MMM

1 1.07 0.72 0

2 1.94 0.70 0.49

3 0.81 0.79 0

4 1.97 0.70 0.51

5 1.17 0.72 0

6 0.34 0.69 0

MAS 1.83 0.88

MMMuc 1.96 0.88

MMM 2.68 0.80

Monitor/Model gives the monitor or model that is being used.

Estimate gives the PM10 mortality effect estimate for the

monitor or model in question. SE is the standard error for the

PM10 mortality effect estimate. Wgt MMM gives the exact

weights assigned to the individual monitors by constrained

MMM.

S. Roberts / Atmospheric Environment 37 (2003) 3317–3322 3321

desirable to form a new PM10 time series that is closely

correlated to the average population exposure to PM10:Constrained MMM, based on the data which assigns

weights to each monitor, gives a larger PM10 mortality

effect estimate than a simple average of the monitors.

This suggests that a combined PM10 time series obtained

from monitors 2 and 4 might be more highly correlated

with the average population exposure to PM10 than the

average of all monitors. It would be of interest to

explore this hypothesis using a personal exposure

survey.

The question of which PM10 monitors are representa-

tive of the population’s exposure to PM10 has been

raised as an open question by Clyde (2000). I believe the

method developed here is a step forward in answering

this question. Using constrained MMM, I have shown

that it may be possible to reduce the full set of PM10

monitors to a smaller set without losing much informa-

tion for PM10 mortality studies. Constrained MMM

applied to Cook County showed that the number of

PM10 monitors, there could be reduced to only two

monitors for this purpose. This information could be

useful in a regulatory setting and possibly for shifting

monitoring resources.

Acknowledgements

I would like to thank Paul Switzer for his comments,

Allen Lefohn for the multiple monitor data and Scott

Zeger for the combined Cook County mortality and

meteorological data used in this paper. The author was

partially supported by a Stanford Graduate Fellowship.

References

Clyde, M., 2000. Model uncertainty and health effect studies for

particulate matter. Environmetrics 11, 745–763.

Davis, J.M., Sacks, J., Saltzmann, N., Smith, R.L., Styer, P.,

1996. Airborne particulate matter and daily mortality in

Birmingham, Alabama. NISS Technical Report No. 55.

Dominici, F., Samet, J.M., Zeger, S.L., 2000. Combining

evidence on air pollution and daily mortality from the

20 largest US cities: a hierarchical modelling strategy.

Journal of the Royal Statistical Society, Series A 163,

263–302.

Dominici, F., Daniels, M., Zeger, S.L., Samet, J.M., 2002a. Air

pollution and mortality: estimating regional and national

dose-response relationships. Journal of the American

Statistical Association 97, 100–111.

Dominici, F., McDermott, A., Zeger, S.L., Samet, J.M., 2002b.

On the use of generalized additive models in time series

studies of air pollution and health. American Journal of

Epidemiology 156, 193–203.

Dominici, F., McDermott, A., Hastie, T., 2003. Issues in semi-

parametric regression with applications in time series studies

for air pollution and mortality. Technical Report, http://

biosun01.biostat.jhsph.edu/~fdominic/jasafeb6.pdf.

Hastie, T.J., Tibshirani, R.J., 1990. Generalized Additive

Models. Chapman & Hall, London.

Ito, K., Kinney, P.L., Thurston, G.D., 1995. Variations in

PM10 concentrations within two metropolitan areas and

their implications for health effects analyses. Inhalation

Toxicology 7, 735–745.

McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models,

2nd Edition. Chapman and Hall, London.

Samet, J.M., Dominici, F., Curriero, F.C., Coursac, I., Zeger,

S.L., 2000a. Fine particulate air pollution and mortality in

20 US Cities, 1987–1994. The New England Journal of

Medicine 343 (24), 1742–1749.

Samet, J.M., Zeger, S.L., Dominici, F., Curriero, F., Coursac,

I., Dockery, D.W., Schwartz, J., Zanobetti, A., 2000b. The

National Morbidity, Mortality, and Air Pollution Study

Part II: Morbidity and Mortality from Air Pollution in the

United States. Number 94, The Health Effects Institute,

Cambridge, MA.

Schwartz, J., Zanobetti, A., 2000. Using meta-smoothing to

estimate dose-response trends across multiple studies, with

application to air pollution and daily death. Epidemiology

11 (6), 666–672.

Smith, R.L., Davis, J.M., Speckman, P., 1999. Human health

effects of environmental pollution in the atmosphere. In:

Barnett, V., Stein, A., Turkman, F. (Eds.), Statistics in

the Environment 4: Statistical Aspects of Health and

the Environment. John Wiley, Chichester, pp. 91–115

(Chapter 6).

ARTICLE IN PRESSS. Roberts / Atmospheric Environment 37 (2003) 3317–33223322

http://biosun01.biostat.jhsph.edu/~fdominic/jasafeb6.pdf

http://biosun01.biostat.jhsph.edu/~fdominic/jasafeb6.pdf

Combining data from multiple monitors in air pollution mortality time series studies

Documents

Transcript of Combining data from multiple monitors in air pollution mortality time series studies