Statistics at the EPA
Barry D. Nussbaum, Chief [email protected]
Presented to COPAFS
September 21, 2012
1U.S. Environmental Protection Agency
Where Do We Get Our Data
• Monitoring• Administrative Data
– Permits– Required Submissions– We may be the league leader in this
2U.S. Environmental Protection Agency
Uses of Data
• Like most agencies we never know who will use what for whatever
• I never met a datum I didn’t like BUT• Data are neither good nor bad, but may be
good or bad for a particular use
U.S. Environmental Protection Agency 3
Toxics Release Inventory
• A great example• Annual reporting of toxics released, recycled, • 20,000 reporting entities• 650 toxic chemicals• Size requirement• No mobile sources• Engineering estimates
U.S. Environmental Protection Agency 5
Other Uses of TRI DataFinancial Sector
Used by some mutual funds for “social responsibility”
Labor UnionsUsed in contract negotiations
Internal RevenueUsed for tax on CFCs
Internal ProcessingReactions to filing led to lower releases
U.S. Environmental Protection Agency 6
Has anyone written about it
• Several books• Many Research Articles (EPA funded 40)
– Impact of information disclosure– Factors driving firms to adopt environmental
policies– Effects of pollution prevention efforts– Relationship between parents and subsidiaries
U.S. Environmental Protection Agency 7
Some New Items
• Proving tests equivalent• Data we can’t detect• Bayes• Social Media• Geoplatform
U.S. Environmental Protection Agency 8
Equivalence of Tests
• Many of our regulations incorporate testing methodology
• What happens when someone has a better, cheaper, quicker test?
• Showing equivalence is the opposite of what we learn all through statistical studies
U.S. Environmental Protection Agency 9
Methods to Show Equivalence
• Two One-Sided t-test• Tricky when a new method has some
physical restrictions• Equality of
– Means– Variances– Covariances
U.S. Environmental Protection Agency 10
Non-Detects
• A problem, paradoxically• Occurs in many of our programs• Several Techniques
– Half the Detection Limit– All the Detection Limit– Kaplan-Meier Techniques– ProUCL
U.S. Environmental Protection Agency 11
Our Foray Into Social Media
• Statipedia– A wiki
• Confident Correlation– A version of facebook
• From Yammer to Office 365 Suite– Collaborative tool
U.S. Environmental Protection Agency 12
GeoPlatform
• We map everything• An opportunity to collapse our stovepipes
– Water quality– Air quality– Hazardous waste– Geographic entities
U.S. Environmental Protection Agency 13
A New “Most Popular Question”
14U.S. Environmental Protection Agency
Statistical Software
• SAS• S-Plus• BMDP• MATLAB• STATISTICA• Mathematica• Systat• Minitab
15U.S. Environmental Protection Agency
Gulf Oil Spill
16
EPA if Inland WaterCoast Guard if Coastal Water
17
Gulf Oil Spill
18
EPA’s Main Roles
• Collect samples along the shoreline and beyond for chemicals related to oil and dispersants in the air, water, and sediment
• Support and advise the Coast Guard efforts to clean the reclaimed oil and waste from the shoreline
• Closely monitor the effects of dispersants in the subsurface environment
19
EPA Data Collection
• Air Quality– Air monitoring aircraft– Air monitoring on the ground
• Water Samples• Sediment Sampling
20
Some tough questions at the table
21
Siting decisions, using statistics
Monitor for air pollutants, such H2S, along the coast
6 sites collecting H2S concentrations near Venice, LA
REALITYLimitations in time, $, and equipment
Reduction in H2S site operation
Can you work with our data manager to retrieve all the historical H2S data from the Venice locations? What we need is a pretty basic analysis of where the values are highest. We are deploying a mobile trailer from region 5 to Venice and the V02 site has the best infrastructure. However, if its H2S distribution is lower than V03, V05, or V06, then we have a problem. Could this be handled by Tuesday? Region 5 is hitting the road and we need to determine the destination!
Fireworks or stats? Hmm…
July 4th weekend:
V02
V04
V01
V05
V03
V06
Best infrastructure
Highest peaks
“Overall high” site: consistently higher than other sites
We cannot count on V02 to represent the other sites for H2S
Answer isn’t always clear cut.Do we want to protect the public from the occasional high “peaks,” or do we want to keep track of consistently high H2S concentrations ?
Presenting the Data
• In general, get it out fast• Some description
25
…and look who has the answers
26
27