Combining data from multiple monitors in air pollution mortality time series studies
-
Upload
steven-roberts -
Category
Documents
-
view
213 -
download
0
Transcript of Combining data from multiple monitors in air pollution mortality time series studies
Atmospheric Environment 37 (2003) 3317–3322
Combining data from multiple monitors in air pollutionmortality time series studies
Steven Roberts*
Department of Statistics, Stanford University, Sequoia Hall, 390 Serra Mall, Stanford, CA 94305-4065, USA
Received 29 October 2002; accepted 30 March 2003
Abstract
In community time series studies on the effect of particulate air pollution on mortality, particulate air pollution data
are often available from multiple monitors. Published studies have combined the data from the multiple monitors using
a simple or trimmed average. In this paper, I describe an alternative method for combining the particulate air pollution
data available from multiple monitors. Using time series data from Cook County, Illinois the proposed method yields
more meaningful biologically plausible particulate air pollution mortality effect estimates without loss of statistical
accuracy and reduces the full set of monitors to a smaller set without loss of information. The proposed monitor
combination may be more highly correlated with the average population exposure to particulate air pollution than a
simple or trimmed average of the monitors.
r 2003 Elsevier Ltd. All rights reserved.
Keywords: PM10; Air pollution; Time series; Monitoring; Mortality
1. Introduction
There have been numerous community time series
studies on the effect of air pollution on mortality
(Dominici et al., 2000, 2002a; Ito et al., 1995; Samet
et al., 2000a; Schwartz and Zanobetti, 2000; Smith et al.,
1999). These studies typically fit a generalized additive
model (GAM) (Hastie and Tibshirani, 1990) or general-
ized linear model (GLM) (McCullagh and Nelder, 1989)
to time series of daily mortality, air pollution and
meteorological covariates. The fitted models are then
used to quantify the effect an increase in air pollution
has on mortality.
Many of these community air pollution mortality time
series studies have been carried out in regions that have
multiple air pollution monitoring sites. These studies
typically combine the multiple monitors into a single air
pollution time series using a simple or trimmed average.
In this paper, I investigate an alternative method of
combining the data from the multiple air pollution
monitoring sites. Each site is allowed its own co-efficient
for air pollution mortality. Then I maximize a con-
strained likelihood for the observed mortality time
series. The resulting monitor combination will be shown
to have advantages over using a simple or trimmed
average of the monitors.
2. Data
The data used to illustrate the methods of this paper
are PM10 time series from Cook County, Illinois for the
period 1987–1994. The PM10 time series measures the
ambient 24-h concentration of particulate matter of less
than 10 mm in diameter. These are the same Cook
County data used in Samet et al. (2000a).
The mortality time series are non-accidental daily
deaths of individuals aged 65 and over. I also use time
series of the daily 24-h mean temperature and mean dew
point temperature. The temperature and dew point data
are based on hourly observations taken at O’Hare
ARTICLE IN PRESS
*Fax: +1-650-725-8977.
E-mail address: [email protected] (S. Roberts).
1352-2310/03/$ - see front matter r 2003 Elsevier Ltd. All rights reserved.
doi:10.1016/S1352-2310(03)00289-9
International Airport. For more information on how the
various time series were constructed refer to Samet et al.
(2000a).
Pollution data were obtained from the United States
Environmental Protection Agency’s Aerometric Infor-
mation Retrieval System (AIRS). In Cook County, there
were six PM10 monitoring sites in operation for the full
period 1987–1994. Fig. 1 shows the location of the six
monitors. The distance between monitors 1 and 4 is
17:7 Km: Four of the six monitors had PM10 readings
every six days, taken on the same day. The other two
monitors had daily PM10 data. I extracted the relevant
every-six-day readings from the two daily recording
monitors to harmonize these daily time series with the
other four series. In the recent 90 cities’ study (Samet
et al., 2000b) most cities had PM10 monitors in
operation on the same six day collection schedule.
Table 1 contains a summary of the data available, within
the every-six-day collection schedule, for each of the six
monitors, as well as qualitative information about the
location of the monitors. Remaining missing values in
the six day collection schedule, for a given monitor, were
imputed using linear regression with the other monitors
as predictors. (In all, I used six monitoring sites each
with 487 PM10 values during the study period.)
Each of the six PM10 time series was standardized to
have zero mean and unit variance. This was done to
avoid problems that may arise due to differences in the
scale of PM10 variation among the six PM10 monitors. If
the PM10 time series were not standardized then
doubling the PM10 readings from one of the monitors
would halve the PM10 mortality effect estimate for that
monitor. This in-turn would make this PM10 monitor
seem less important in describing the PM10 mortality
relationship. Standardizing the PM10 time series re-
moves this problem of interpretation.
3. Methodology
In many recent community time series studies on the
effect of PM10 on mortality, an additive model of the
following form is fit to the time series of observed
mortality:
f ðmtÞ ¼ mþ confounderst þ b PMt; ð1Þ
where, t is the observation day, mt the mean number of
deaths on day t; m the intercept term, f a link function
(typically f will either be the square-root, log or identity
function), confounderst the other time-varying variables
which are related to daily mortality (typical confounders
include: temperature, humidity, longer term mortality
trends and seasonality), PMt the PM10 concentration
ARTICLE IN PRESS
5
3
2
1
4 Chicago
*MichiganLake
Arlington Heights*
6
Fig. 1. Map of Cook County, Illinois showing the location of
the six PM10 monitors. The distance between monitors 1 and 4
is 17:7 Km:
Table 1
Summary of data available from the Cook County, Illinois PM10 monitors, within the six day collection schedule, for the period 1987–
1994, inclusive
Monitor Years Days
of Data
%Missing
Land
use
Location
setting
1 87–94 463 5 CO SU
2 87–94 440 10 IN SU
3 87–94 455 7 CO CI
4 87–94 417 14 CO SU
5 87–94 469 4 IN SU
6 87–94 457 6 RE SU
Monitor gives the monitor number. Years gives the years the monitor was in operation. Days of Data gives the number of days the
monitor collected a PM10 concentration. % Missing gives the percentage of days the monitor was missing a PM10 concentration. Land
use is the prevalent land use within a 14mile of the monitor. Key: CO, commercial; IN, industry; RE, residential. Location setting is the
type of environment in which the monitor is located. Key: CI, city; SU, suburban.
S. Roberts / Atmospheric Environment 37 (2003) 3317–33223318
time series, b the effect of PM10 on mortality (it gives the
increase in f ðmtÞ per unit increase in PM10).
In Dominici et al. (2000, 2002a), Samet et al. (2000a)
the link f is taken to be the log function, the confounder
adjustments use smooth non-parametric functions and
indicator variables, and PMt is taken to be either the
current day or a previous day PM10 concentration.
Davis et al. (1996) take the link f to be the square root
function, the confounders adjusted parametrically and
PMt is the average of the current and previous two days’
PM10 concentrations. The b value from these studies was
used to quantify the effect of PM10 on mortality.
When there are multiple PM10 monitors the data from
these monitors are typically combined into one PM10
time series. Dominici et al. (2000) and Samet et al.
(2000a) first adjust the values at each monitor by its
respective yearly average. Then for each day a 10%
trimmed average of the adjusted series is used to
represent the area-wide PM10 for that day. Davis et al.
(1996) combine the data from the multiple monitors
using a simple average. The resulting single PM10 time
series is then used in model (1).
In an air pollution mortality time series study with k
PM10 monitors, model (1) may be extended
f ðmtÞ ¼ mþ confounderst þ b1 PMt1 þ b2 PMt2
þ ?þ bk PMtk; ð2Þ
where PMts represents the standardized PM10 time series
for monitor s:The PM10 component from model (2) can be re-
written in the following form:
Xk
s¼1
bsPMts ¼ bXk
s¼1
asPMts
!; ð3Þ
where b ¼Pk
s¼1 bs andPk
s¼1 as ¼ 1:In this form, we can see that the PM10 coefficients can
be written as a weighted average with weights, as; thatsum to one, and b is the increase in the mortality index
f ðmtÞ if the PM10 level at each monitor is one standard
deviation above its mean level. Studies that use an
average of the available PM10 monitors assume that the
as values are all equal.
I further constrain the bs to be non-negative. This
constraint implies that increases in PM10 cannot
decrease the expected mortality rate, i.e. the bs are
constrained to be biologically plausible. Model (2) can
also be fit without placing constraints on the bs: It willbe shown that the increase in bias, by constraining the bs
to be non-negative, is offset by the decrease in variance.
Constraining the bs has other desirable properties that
will be discussed later.
Once the form of model (2) is chosen, the model
parameters are fit using a constrained maximum like-
lihood. I obtain the constrained maximum likelihood
estimates iteratively in two repeated steps. The first step
fixes the parameters corresponding to the PM10 terms,
the bs; at their current values and estimates the
parameters corresponding to the confounders. This can
be done using GLM software. The second step fixes the
parameters corresponding to the confounders at their
current values and estimates the parameters correspond-
ing to the PM10 terms. This can be done using
constrained optimization software. These two steps are
iterated to obtain the final estimates of bs: The iterativemethod is summarized as follows:
1. Choose starting values for bs:2. Fix bs: Estimate the confounder parameters.3. Fix the confounder parameters. Re-estimate bs:4. Fix bs: Re-estimate the confounder parameters.5. Repeat steps 3 and 4 until the PM10 effect estimates,
the bs; converge.
At each step, the parameters are being estimated using
maximum likelihood. The constrained likelihood is
bounded and the steps in the iterative estimation
procedure cannot decrease the constrained likelihood.
Thus the iterative estimation procedure will converge.
This method will be referred to as the constrained
multiple monitor method (constrained MMM). The
final bs values can then be used to form b and as; asdefined in Eq. (3). The as can be used to identify a subset
of the monitors that capture as much PM10 mortality
variation as all the monitors combined. If the as values
for some monitors are zero, or near-zero, then con-
strained MMM is re-fit using a reduced set of monitors.
Constrained MMM can be extended to include co-
pollutants by including co-pollutants in model (2) in the
same way as PM10:
4. Simulations
I use simulations to compare constrained MMM to
using a simple average of the monitors (MAS) or an
unconstrained MMM (MMM fit without constraining
the bs to be non-negative).
In the simulations, it is assumed there is an underlying
unobserved PM10 time series that is related to daily
mortality. The PM10 monitors are more or less
correlated with the underlying PM10 time series. The
goal is to estimate, using the PM10 time series from the
monitoring sites, the effect of the underlying PM10 time
series on mortality. For the simulation study, I took the
underlying PM10 time series to be the Cook County
monitor 1 time series denoted by PMut : The remaining
five Cook County monitors are then used to estimate the
PM10 mortality effect, via constrained MMM, MAS or
unconstrained MMM.
Using PMut ; the log mean of the simulated Poisson
mortality time series is given by logðmtÞ ¼ logð83Þ þ
ARTICLE IN PRESSS. Roberts / Atmospheric Environment 37 (2003) 3317–3322 3319
fPMut : The average daily mortality in Cook County was
83. f is the mortality effect of the underlying PM10 time
series. The f values used in the simulations span the
plausible range for the PM10 mortality effect. For
simplicity, confounders were not included in the
simulations.
For each value of f; I generated 200 mortality time
series of length 487 days, to match the length of the
Cook County data set. Constrained MMM, MAS and
unconstrained MMMwere then applied to the generated
mortality time series to estimate f: For each value of f;Table 2 shows the root mean squared error (RMSE)
over the 200 simulations. In each case, the RMSE for
constrained MMM is similar to the RMSE for MAS or
unconstrained MMM. The larger bias of the constrained
MMM estimator is compensated for by a decrease in
estimation variance. (These simulations were repeated
using each of the other five monitors as the underlying
PM10 time series. In each case, results similar to those
for monitor 1 were obtained.)
Another advantage of constrained MMM is that it
can assign zero weight to a monitor. This allows the full
set of monitors to be reduced to a smaller set, for these
purposes without loss of information. The last column
in Table 2, 0 Weight, contains the median number of
monitors that were given exactly zero weight by
constrained MMM.
5. Application to Cook County, Illinois
In this section, constrained MMM is applied to the
actual mortality time series and PM10 monitoring data
from Cook County, as described in Section 2. I fit a
Poisson log-linear model similar to those used in
Dominici et al. (2002b, 2003). The model I used was
logðmtÞ ¼ mþ St1ðtime; 8=yearÞ þ St2ðtem p0; 6Þ
þ St3ðtem p1�3; 6Þ þ St4ðdew0; 3Þ
þ St5ðdew1�3; 3Þ þ gDOWt þ b PMt; ð4Þ
where, t is the observation day, mt the mean number of
deaths on day t; m the intercept term, St1ðtime; 8=yearÞ asmooth function of time with eight degrees of freedom
per year (the smooth function is represented using
natural cubic splines), St2ðtem p0; 6Þ and St3ðtem p1�3; 6Þsmooth functions with a total of six degrees of freedom,
tem p0 the current day’s mean 24 h temperature and
tem p1�3 the average of the previous three days’ 24 h
mean temperatures (the smooth functions are repre-
sented using natural cubic splines), St4ðdew0; 3Þ and
St5ðdew1�3; 3Þ similar functions for the 24-h mean dew
point temperature, DOWt a set of indicator variables for
the day of the week, g a vector of co-efficients that
contains the mortality adjustments for the day of the
week.
The PM10 time series used here is the current day
PM10 level. An exploratory analysis suggested that the
current day PM10 gives more significant results than
using either the day before or two days before PM10
level. I do not account for this selection effect.
Table 3 contains the results of b estimation by several
methods:
1. Single PM10 monitors.
2. The simple average of the six PM10 monitors (MAS).
3. Constrained MMM.
4. Unconstrained MMM.
The first six rows correspond to the six monitors used
individually. For rows 1–6, Estimate is the estimated
percentage increase in mortality for a one standard
deviation increase in the PM10 level at that monitor. For
MAS, constrained MMM, and unconstrained MMM,
Estimate is the estimated percentage increase in
ARTICLE IN PRESS
Table 2
Results of simulations comparing constrained MMM (MMM),
simple average of the six monitors (MAS) and unconstrained
MMM (MMMuc), using monitor 1 as the underlying PM10 time
series
f Model Mean SD RMSE 0 Weight
0
MMMuc �0.0005 0.0059 0.0059
MMM 0.0036 0.0038 0.0052 4
MAS �0.0004 0.0058 0.0058
0.005
MMMuc 0.0043 0.0056 0.0057
MMM 0.0070 0.0044 0.0049 3
MAS 0.0043 0.0056 0.0056
0.01
MMMuc 0.0094 0.0054 0.0054
MMM 0.0115 0.0050 0.0052 3
MAS 0.0095 0.0054 0.0054
0.02
MMMuc 0.0188 0.0057 0.0058
MMM 0.0203 0.0053 0.0053 2
MAS 0.0190 0.0057 0.0057
0.03
MMMuc 0.0294 0.0051 0.0051
MMM 0.0304 0.0048 0.0048 2
MAS 0.0296 0.0051 0.0051
0.06
MMMuc 0.0587 0.0059 0.0060
MMM 0.0594 0.0058 0.0058 1
MAS 0.0592 0.0059 0.0059
f is the effect of the underlying PM10 time series on mortality.
Model is the model that is being fit to the simulated mortality
time series in order to estimate f:Mean is the mean of the 200 festimates. SD is the standard deviation of the 200 f estimates.
RMSE is the root mean squared error of the 200 f estimates. 0
Weight is the median number of monitors that received exactly
zero weight from constrained MMM.
S. Roberts / Atmospheric Environment 37 (2003) 3317–33223320
mortality if the PM10 level increases by one standard
deviation at each of the six monitors. The estimated
percentage increase in mortality is given by 100ðeb �1ÞE100b: SE is the standard error of the estimated
percentage increase in mortality. The standard error for
constrained MMM is calculated using a parametric
bootstrap. Using the parameter estimates from con-
strained MMM, 100 new mortality time series are
simulated, constrained MMM is then applied to the
simulated mortality time series to obtain a b estimate.
The standard error is calculated as the standard
deviation of the 100 constrained MMM b estimates.
The weights as assigned to the six monitors using
constrained MMM are shown in column 4. The zero
weight given to monitors 1,3,5 and 6 are the exact
weights, they were not near-zero weights that have been
rounded to zero.
The results for the six single monitor models show
that the location of the monitor can have an important
bearing on the PM10 mortality effect estimate. For
example, monitors 2 and 4 give PM10 mortality effect
estimates that are larger than the PM10 mortality effect
estimates from the other four monitors. This is similar to
what Ito et al. (1995) found. This raises an important
issue for air pollution mortality studies, particularly in
areas with only a small number of monitors. In such
regions, these results show that the estimate for the
effect of PM10 on mortality may depend on the monitor
location even after standardizing. The constrained
MMM weights suggest that monitors 2 and 4 will
capture as much of the PM10 mortality variation as all
six monitors. Fig. 1 shows the location of these two
monitors.
Using monitors 2 and 4 alone, model (4) was fit using
the standard GLM estimating procedure (GLMstd)
which does not constrain the bs: The estimated bs from
GLMstd are essentially the same as those obtained by
the constrained MMM iterative estimation procedure,
which constrains the bs: This provides a check that the
iterative constrained MMM estimation procedure is
working correctly.
6. Discussion and conclusion
In this paper, I explored issues related to multiple
PM10 monitors in community PM10 mortality time series
studies. I showed, using data from Cook County, how
the PM10 mortality effect estimate depends on the
location of the monitor. Next, I applied a method of
optimally combining the PM10 data available from
multiple monitors (constrained MMM), and compared
it to using an average of monitors (MAS) and
unconstrained MMM. Based on a simulation study,
constrained MMM mortality estimates appear to have
the following advantages:
1. More meaningful biologically plausible PM10 mor-
tality effect estimates without loss of statistical
accuracy.
2. The full set of monitors can be reduced to a smaller
set, for the purposes of estimating PM10 mortality
effects, without losing information.
If a monitor is given zero weight by constrained MMM
it does not mean this monitor is less important than
another monitor that is given non-zero weight for
describing the PM10 mortality relationship. For exam-
ple, consider a study with three PM10 monitors where
monitors 1 and 2 are highly correlated, and these two
monitors are more strongly associated with community
mortality than monitor 3. In this situation, constrained
MMM may give non-zero weight to monitors 1 and 3,
and zero weight to monitor 2. This does not mean
monitor 3 is more important than monitor 2. It only
means that in the presence of monitor 1, monitor 3
describes more of the remaining PM10 mortality varia-
tion than monitor 2.
Simulations showed that there was no loss in
statistical estimation precision that results from con-
straining MMM estimates to be biologically plausible—
biases in constrained MMM are compensated for by
correspondingly lower estimation variance.
In a community study of the effect of PM10 on
mortality, the relevant PM10 variable should be the
average population exposure to PM10; which is not
directly available. Instead, the ambient PM10 concentra-
tion is measured at multiple monitoring sites. Using the
PM10 data from the multiple monitoring sites, it is
ARTICLE IN PRESS
Table 3
Results of fitting the single monitor models, simple average of
the six monitors (MAS), unconstrained MMM (MMMuc) and
constrained MMM (MMM) to the Cook County, Illinois
mortality data
Monitor/Model Estimate SE Wgt MMM
1 1.07 0.72 0
2 1.94 0.70 0.49
3 0.81 0.79 0
4 1.97 0.70 0.51
5 1.17 0.72 0
6 0.34 0.69 0
MAS 1.83 0.88
MMMuc 1.96 0.88
MMM 2.68 0.80
Monitor/Model gives the monitor or model that is being used.
Estimate gives the PM10 mortality effect estimate for the
monitor or model in question. SE is the standard error for the
PM10 mortality effect estimate. Wgt MMM gives the exact
weights assigned to the individual monitors by constrained
MMM.
S. Roberts / Atmospheric Environment 37 (2003) 3317–3322 3321
desirable to form a new PM10 time series that is closely
correlated to the average population exposure to PM10:Constrained MMM, based on the data which assigns
weights to each monitor, gives a larger PM10 mortality
effect estimate than a simple average of the monitors.
This suggests that a combined PM10 time series obtained
from monitors 2 and 4 might be more highly correlated
with the average population exposure to PM10 than the
average of all monitors. It would be of interest to
explore this hypothesis using a personal exposure
survey.
The question of which PM10 monitors are representa-
tive of the population’s exposure to PM10 has been
raised as an open question by Clyde (2000). I believe the
method developed here is a step forward in answering
this question. Using constrained MMM, I have shown
that it may be possible to reduce the full set of PM10
monitors to a smaller set without losing much informa-
tion for PM10 mortality studies. Constrained MMM
applied to Cook County showed that the number of
PM10 monitors, there could be reduced to only two
monitors for this purpose. This information could be
useful in a regulatory setting and possibly for shifting
monitoring resources.
Acknowledgements
I would like to thank Paul Switzer for his comments,
Allen Lefohn for the multiple monitor data and Scott
Zeger for the combined Cook County mortality and
meteorological data used in this paper. The author was
partially supported by a Stanford Graduate Fellowship.
References
Clyde, M., 2000. Model uncertainty and health effect studies for
particulate matter. Environmetrics 11, 745–763.
Davis, J.M., Sacks, J., Saltzmann, N., Smith, R.L., Styer, P.,
1996. Airborne particulate matter and daily mortality in
Birmingham, Alabama. NISS Technical Report No. 55.
Dominici, F., Samet, J.M., Zeger, S.L., 2000. Combining
evidence on air pollution and daily mortality from the
20 largest US cities: a hierarchical modelling strategy.
Journal of the Royal Statistical Society, Series A 163,
263–302.
Dominici, F., Daniels, M., Zeger, S.L., Samet, J.M., 2002a. Air
pollution and mortality: estimating regional and national
dose-response relationships. Journal of the American
Statistical Association 97, 100–111.
Dominici, F., McDermott, A., Zeger, S.L., Samet, J.M., 2002b.
On the use of generalized additive models in time series
studies of air pollution and health. American Journal of
Epidemiology 156, 193–203.
Dominici, F., McDermott, A., Hastie, T., 2003. Issues in semi-
parametric regression with applications in time series studies
for air pollution and mortality. Technical Report, http://
biosun01.biostat.jhsph.edu/~fdominic/jasafeb6.pdf.
Hastie, T.J., Tibshirani, R.J., 1990. Generalized Additive
Models. Chapman & Hall, London.
Ito, K., Kinney, P.L., Thurston, G.D., 1995. Variations in
PM10 concentrations within two metropolitan areas and
their implications for health effects analyses. Inhalation
Toxicology 7, 735–745.
McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models,
2nd Edition. Chapman and Hall, London.
Samet, J.M., Dominici, F., Curriero, F.C., Coursac, I., Zeger,
S.L., 2000a. Fine particulate air pollution and mortality in
20 US Cities, 1987–1994. The New England Journal of
Medicine 343 (24), 1742–1749.
Samet, J.M., Zeger, S.L., Dominici, F., Curriero, F., Coursac,
I., Dockery, D.W., Schwartz, J., Zanobetti, A., 2000b. The
National Morbidity, Mortality, and Air Pollution Study
Part II: Morbidity and Mortality from Air Pollution in the
United States. Number 94, The Health Effects Institute,
Cambridge, MA.
Schwartz, J., Zanobetti, A., 2000. Using meta-smoothing to
estimate dose-response trends across multiple studies, with
application to air pollution and daily death. Epidemiology
11 (6), 666–672.
Smith, R.L., Davis, J.M., Speckman, P., 1999. Human health
effects of environmental pollution in the atmosphere. In:
Barnett, V., Stein, A., Turkman, F. (Eds.), Statistics in
the Environment 4: Statistical Aspects of Health and
the Environment. John Wiley, Chichester, pp. 91–115
(Chapter 6).
ARTICLE IN PRESSS. Roberts / Atmospheric Environment 37 (2003) 3317–33223322