Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges Scott L....
-
date post
20-Dec-2015 -
Category
Documents
-
view
214 -
download
1
Transcript of Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges Scott L....
Assessing the Health Effects of Air Pollution; Statistical and Computational Challenges
Scott L. Zeger on behalf of
The Environmental Biostatistics and Epidemiology Group (EBEG)The Johns Hopkins University
Bloomberg School of Public Health
CISES Meeting – Chicago
October, 2004
Key Collaborators
• Francesca Dominici
• Aidan McDermott
• Jon Samet
• Roger Peng
• Leah Welty
• Hopkins Environmental Biostatistics and Epidemiology Group (EBEG)
http://www.biostat.jhsph.edu/bstproj/ebeg/
Sources of Support
• U.S. National Institute of Health (NIH)
• U.S. Environmental Protection Agency (EPA)
• Health Effects Institute (HEI) - independent non-profit who receives funds from:
– U.S. EPA
– Automobile Manufacturers Association
Outline
• Air pollution and mortality: a brief overview of the epidemiologic evidence– Cohort studies– Time series studies - NMMAPS
• Spatial-time series models– Temporal then spatial models
• Key statistical issues
• Toward reproducible research
Can Air Pollution Kill at Doses an Order of Magnitude Lower?
• “Air pollution”: many constituents– Particles (<2.5 microns penetrate to deep lung)– Ozone– Gases: NO2, SO2, CO– …
• Focus on particles because of epidemiologic data
Key Epidemiologic Evidence
• Chronic exposures: cohort studies
– Six Cities Study (e.g. Dockery, et al , 1993)
– American Cancer Study (e.g. Pope, et al, 2002)
• Acute exposures: multi-city time series studies
– NMMAPS (90 U.S.cities; e.g. Samet, et al, 2000)
– APHEA (29 Eur cities; e.g.Katsouyanni, et al, 2003)
– CANADIAN (8 Cities; e.g. Burnett, Goldberg, 2003)
Six Cities ACS
People 8,111 500,000+
Person years 111,076 7.5M
Deaths 1,430 60,000+
Cities 6 50
Exposure Yearly average Yearly average
Covariates Age,smoking, exercise,+
Age, smoking, exercise, +
Total mortality RR 1.26* 1.10*
Cardio-pulmonary RR 1.37* 1.17*
Lung cancer RR 1.37* 1.29*
* - Most –vs- least in Six Cities Study
Cohort Studies
Public Health Significance
In US, EPA estimates on order of 10,000 particle-attributable deaths per year if cohort relative risks represent a causal effect
Smoking – 400,000 smoking attributable deaths per year
Caveats on Cohort Studies
• Regressions of “adjusted” mortality rates on longer-term average pollution level
• Cross-city ecologic comparisons• Sample size is number of cities
– 6CS – 6– ACS – 50
• What else is different between higher and lower polluted cities?
• Does air pollution cause mortality?
Multi-city Time Series Studies of Acute Effects
• Compare higher to lower polluted days within the same community
• Avoid problem of unmeasured differences among cities
• New confounders
– Longer-term trends in population characteristics, medical practice, smoking rates, changing demographics, etc
– Seasonal effects of infectious diseases and weather
– Day of month, week, holidays
Risk Estimates From Cohort and Time Series Studies
• risks • Cohort studies estimate association between time-to-death
and long-term exposure to air pollution (chronic exposure)
• Time Series studies estimate association between risk of death and the level of air pollution shortly before death conditional on longer-term exposures (acute exposure)
Time series studies of particulate pollution are useful to address the causal question, not to estimate the size of health effects. They ignore chronic exposures.
National Morbidity and Mortality Air Pollution Study (NMMAPS)
• HEI funded collaboration of Johns Hopkins and Harvard Universities; Jon Samet, PI
• 90 largest U.S cities covering roughly 40% of annual deaths (now 105)
• 1987- 1994; now updated through 2001
• Mortality and hospitalizations (14 cities)
Three Models
• “Three stage”- as in previous slide• “Two stage”- ignore region effects; assume
cities have exchangeable random effects• Two stage with “spatial” correlation -city
random effects have isotropic exponentially decaying autocorrelation function
Joint Estimation of 90 City Slopes With Spatial Model
• Approximate the conditional distribution of each city estimate given its true value by a Gaussian model with mean and variance equal to the mle and inverse of Fisher information under an over-dispersed Poisson model
• No borrowing strength across cities for estimation of smooth functions of time and temperature (a full Bayesian analysis with “infinite prior variances for these terms)
Joint Estimation
• MCMC implementation with proper priors for the variance components– Standard uninformative priors are not– Half Gaussians with large variances on ^2
• Have compared inferences to full Bayes analysis in a parametric analogue – no difference
Scientific and Statistical Issues
1. Model for the baseline frailty process and other unmeasured confounders process in space and time
– personal variables (smoking, exercise) – city-specific variables (demographics, medical services) – influenza epidemics
2. Co-pollutants 3. Public health significance: “harvesting?”4. Distributed lags5. Reproducible research
1. Model for Spatial Time Series
• By collecting people across a large city, central limit theorem smooths out individual behaviors and produces a temporally smooth nuisance function
• Ignore the spatial correlation in mortality process and estimate city-specific relative rates
• Model spatial associations among rate estimates instead of modeling associations among the mortality events themselves
2. Co-pollutants
Recent Testimony on the EPA Proposed Decision on Particulate Matter
Suresh H. Moolgavkar, M.D., Ph.D.
Member, Fred Hutchinson Cancer Research Center; Professor of Epidemiology and Biostatistics, University of Washington - Leading
Industry Consultant
“the potential for uncontrolled confounding by co-pollutants currently preclude the conclusion that the particulate component of air pollution is causally associated with adverse effects on human health.”
Co-pollutants
• Estimated the same model with– PM10 + ozone– PM10 + ozone + NO2– PM10 + ozone + SO2– PM10 + ozone + CO
• Pooled data over the largest 20 cities that tell most of the story
3. Public Health Significance
• Harvesting idea – Only the very frail could possibly die from air
pollution– They would have died anyway in a few days– Air pollution, kills but causes only a trivial loss
of quality days of life• If true, we would expect associations only at
shorter time scales
4. Distributed Lag Models
• NMMAPS described mortality as a function of air pollution u=1 (or 0,2,3) days before because PM data only available every sixth days in most cities
• To capture the entire acute effect, must include pollution levels from previous week or two
• Two statistical-computational issues– How to flexibly model the distributed lags– How to contend with substantial missing covariate data
i
Effect of unit increase in PM10 7 days ago on today’s mortality
Distributed Lag Function
= ‘total effect’
i
i
0 2 4 6 8 10 12 14
-0.0002
-0.0001
0
0.0001
0.0002
max likelihood (-0.00038)natural spline (-0.00042)smoothing spline (-0.00038)smoothing spline (-0.00038)
Example DLMs for PM10 on Mortality
Chicago 1987-2000
1. No knowledge of early lag effects
2. Lag effects must eventually go to zero
3. Lag effects get smoother further back in time
Prior Knowledge of DL Function
Our approach:
Construct as to reflect 1-3
Constructing Distributed Lag Prior
1. No knowledge of early lag effects
2. Lag effects must eventually go to zero
Large Variances → Small Variances
-1.0 -0.8 -0.6 -0.4 -0.2 0.0
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
correlation
vari
an
ce0.01
0.015
0.015
0.018
0.025
0.018
0.008
0.019
0.031
0.03
0.015
0.009
0.019
0.031
0.035
0.023
0.009
0.011
0.019
0.029
0.032
0.024
0.011
0.009
0.013
0.02
0.025
0.025
0.018
0.009
0.009
0.012
0.017
0.02
0.021
0.017
0.011
0.018
0.021
0.021
0.017
0.011
0.025
0.022
0.015
0.009
0.015
0.011
Bayesian Averaged Dist Lags of PM10 on Mort (Chicago)
-1 e-03 -5 e-04 0 e+00 5 e-04
0
20
40
60
80
100
120average total effect = -2e-04
Total Effect
1
2i
i0 2 4 6 8 10 12 14
-0.0004
-0.0002
0
0.0002
0.0004
-1.0 -0.8 -0.6 -0.4 -0.2 0.0
-1.0
-0.8
-0.6
-0.4
-0.2
0.0
correlation
vari
an
ce
0.031
0.028
0.025
0.021
0.017
0.013
0.009
0.031
0.028
0.024
0.02
0.015
0.011
0.03
0.027
0.022
0.018
0.014
0.01
0.029
0.025
0.02
0.016
0.012
0.028
0.023
0.018
0.013
0.009
0.025
0.02
0.015
0.011
0.022
0.016
0.012
0.018
0.013
0.009
0.015
0.01
0.014
0.009
0.016
0.011
0 2 4 6 8 10 12 14
-0.0005
0
0.0005
-5 e-04 0 e+00 5 e-04 1 e-03
0
20
40
60
80
100
average total effect = 2e-04
2
1
Total Effect
i
i
Bayesian Averaged Dist Lags of PM10 on Mort (Detroit)
Toward Reproducible Epidemiologic Reseach (RER)
• U.S. EPA setting national policy about air pollution based on acute and chronic disease studies – lots of $$ at stake
• Research conducted in the context of an adversarial debate about whether current levels of pollution cause mortality – credibility of epidemiologic evidence
Convergence Problem
• NMMAPS estimated the city-specific relative rates using Generalized Additive Models (gam) in S-plus
• gam relies upon several parameters, four of which control the decision of when to declare convergence of the estimation algorithm
• 5 years into work, we discovered that the default parameters we used were too lax for our application
• In addition, Ramsey, et al discovered the gam under-estimates the standard errors of the relative rates estimates
Model Sensitivity: Relative Rate estimates for GAM (default and strict) versus GLM
Dominici, McDermott, Zeger, Samet AJE 2002
GAM (default) versus GLM estimates GAM(strict) versus GLM estimates
“(A)lthough many questions remain about how fine particles kill people, the NMMAPS study shows there’s no mistaking that PM is the culprit
NMMAPS in ScienceJuly 2000
Understatement of statistical uncertainty in the press
Toward Reproducibility in iHAPSS
• Post papers (tech reports) on iHAPSS web-site• Hyperlink main results in paper (tables, figures) to
– Statistical computing environment (R) with:• program that generates the results• datafile used by the program to generate the results
• Give user opportunity to alter the analyses – In this computing environment– In their own environment?
R as a Platform for Distributing Data
• Convenient online help system for documenting datasets
• Vignette system for more detailed descriptions of data or code
• Functions can be provided for handling data• Data can be delivered as a single unit/package,
rather than in separate (possibly unlinked) pieces
NMMAPSdata
• Preprocessing functions for setting up the database to reproduce recent NMMAPS findings
– basicNMMAPS: analysis of PM10 and mortality
– seasonal: estimating seasonally varying effects of PM10
– tempDLM: distributed lag models for temperature
NMMAPSdata Index
• Number of U.S. cities: 108• Number of days of observations: 5114• Number of age categories: 3• Number of variables: 291• Database size (uncompressed): 2.5GB
Toward Reproducibility of Epidemiologic Research
• iHAPSS as a model• Journals require that published papers be
accompanied by programs/data necessary to reproduce their results
• Next steps to move the field in this direction
Main Points Once Again
• Reviewed the epidemiologic evidence for an association of particulate air pollution and mortality
– Cohort studies: RR=1.25 across range of exposures
– Time series studies:
• Mortality in space and time
– Summarize over time, then analyze in space
Main Points Once Again
• Value of Bayes estimates of maps of relative risks• Time-scale specific relative risks• Distributed lags models• Reproducible Epidemiologic Research
Science Statistics
Testimony on the EPA Proposed Decision on Particulate Matter
Suresh H. Moolgavkar, M.D., Ph.D.Professor of Epidemiology and Biostatistics, University of Washington
Industry Consultant “The proposed new regulations for particulate matter are based on the
assumption that the magnitude of the associations between these pollutants and adverse human health effects reported in some epidemiologic studies is predictive of the gains in human health that would accrue by lowering ambient concentrations. The evidence simply does not support this assumption. Briefly, the dearth of toxicological information, the absence of biological understanding of underlying mechanism, and the potential for uncontrolled confounding by co-pollutants currently preclude the conclusion that the particulate component of air pollution is causally associated with adverse effects on human health.”