Understanding and Accounting for Method …Understanding and Accounting for Method Variability in...

78
Understanding and Accounting for Method Variability in Whole Effluent Toxicity Applications Under the National Pollutant Discharge Elimination System Program June 30, 2000

Transcript of Understanding and Accounting for Method …Understanding and Accounting for Method Variability in...

  • Understanding and Accounting for Method Variability in Whole Effluent Toxicity Applications

    Under the National Pollutant Discharge Elimination System Program

    June 30, 2000

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    This page intentionally left blank.

    June 30, 2000 ii

  • NOTICE AND DISCLAIMER

    This document provides guidance to NPDES regulatory authorities and persons interested in whole effluent toxicity testing. This document describes what EPA believes to be sources of variability in the conduct of whole effluent toxicity testing under the Clean Water Act. The document is designed to reflect national policy on these issues. The document does not, however, substitute for the Clean Water Act, an NPDES permit, or EPA or State regulations applicable to permits or whole effluent toxicity testing; nor is this document a permit or a regulation itself. The document does not and cannot impose any legally binding requirements on EPA, States, NPDES permittees, and/or laboratories conducting whole effluent toxicity testing for permittees (or for States in the evaluation of ambient water quality). EPA and State officials retain discretion to adopt approaches on a case-by-case basis that differ from this guidance based on an analysis of site-specific circumstances. This guidance may be revised without public notice to reflect changes in EPA policy.

    June 30, 2000 iii

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    This page intentionally left blank.

    June 30, 2000 iv

  • TABLE OF CONTENTS

    Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi List of Acronyms and Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii 1.0 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1

    1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Effect of This Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1.3 Three Goals of This Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

    2.0 DEFINITION AND MEASUREMENT OF METHOD VARIABILITY

    IN WET TESTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.1 Terms and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.2 Defining WET Test Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1 2.3 Quantifying WET Test Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2

    3.0 VARIABILITY OF WET TEST METHODS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.1 Acquisition, Selection, and Quality Assurance of Data Presented in this Document . . . . . . . 3-1 3.2 Variability of EC25, LC50, and NOEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2

    3.2.1 Within-Laboratory Variability of EC25, LC50, and NOEC . . . . . . . . . . . . . . . . . . . . 3-2 3.2.2 Between-Laboratory Variability of EC25, LC50, and NOEC . . . . . . . . . . . . . . . . . . . 3-7

    3.3 Variability of Endpoint Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8 3.4 Conclusions about Variability of WET Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10

    3.4.1 Variability of EC25, LC50, NOEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 3.4.2 Variability of Endpoint Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11

    4.0 VARIABILITY IN CONTEXT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.1 Society of Environmental Toxicology and Chemistry Pellston WET Workshop . . . . . . . . . . 4-1

    4.1.1 General Conclusions and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1 4.1.2 Conclusions about Data Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2

    4.2 Water Environment Research Foundation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 4.3 Minimizing Variability by Adhering to WET Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4

    5.0 GUIDANCE TO REGULATORY AUTHORITIES, LABORATORIES AND PERMITTEES:

    GENERATING AND EVALUATING EFFECT CONCENTRATIONS . . . . . . . . . . . . . . . . . . 5-1 5.1 Steps for Minimizing Test Method Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.2 Collecting Representative Effluent Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.3 Conducting the Biological Test Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2

    5.3.1 Quality Control Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3 5.3.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-5 5.3.3 Test Power to Detect Toxic Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7

    5.4 Test Acceptability Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9 5.5 Conducting the Statistical Analysis To Determine the Effect Concentration . . . . . . . . . . . . 5-10 5.6 Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11

    June 30, 2000 v

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    TABLE OF CONTENTS (continued)

    6.0 GUIDANCE TO REGULATORY AUTHORITIES: DETERMINING REASONABLE POTENTIAL AND DERIVING WET PERMIT CONDITIONS . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.1 Analytical and Sampling Variability in Calculations for Reasonable

    Potential and Permit Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.1.1 “Adjusting for Analytical Variability” in Calculations

    for Reasonable Potential and Permit Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1 6.1.2 Analytical Variability and Self-monitoring Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2 6.1.3 Precision of WET Measurements and Estimates of Effluent CV . . . . . . . . . . . . . . . . 6-2 6.1.4 Between-Laboratory Variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3

    6.2 Determining Reasonable Potential and Establishing Effluent Limits . . . . . . . . . . . . . . . . . . . 6-4 6.3 Development of a Total Maximum Daily Load for WET . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5 6.4 Accounting for and Minimizing Variability in the Regulatory Decision Process . . . . . . . . . . 6-5

    6.4.1 Recommended Additional TACs: Lower and Upper Bounds for PMSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5

    6.4.2 How To Determine the NOEC Using the Lower PMSD Bound . . . . . . . . . . . . . . . . . 6-8 6.4.3 Justification for Implementing the Test Sensitivity Bounds . . . . . . . . . . . . . . . . . . . . 6-8 6.4.4 Guidance to Testing Laboratories on How to Achieve

    the Range of Performance for PMSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 6.5 Additional Guidance That Regulatory Authorities Should Implement

    to Further Support the WET Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9 6.6 Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10

    7.0 CONCLUSIONS AND GUIDANCE TO LABORATORIES, PERMITTEES, and REGULATORY AUTHORITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.1 General Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 7.2 Recommendations for Minimizing Variability and Its Effects . . . . . . . . . . . . . . . . . . . . . . . . . 7-2

    7.2.1 Guidance to Toxicity Testing Laboratories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3 7.2.2 Guidance to NPDES Permittees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3

    7.3 Guidance to Regulatory Authorities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-4 7.4 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5

    8.0 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1

    Appendix A Interim Coefficients of Variation Observed Within Laboratories for Reference Toxicant Samples Analyzed Using EPA’s Promulgated Whole Effluent Toxicity Methods

    Appendix B Supplementary Information for Reference Toxicity Data

    Appendix C Sample Calculation of Permit Limits Using EPA’s Statistically-Based Methodology and Sample Permit Language

    Appendix D Frequently Asked Questions (FAQs)

    Appendix E Examples of Selected State WET Implementation Programs

    Appendix F Improvements in Minimizing WET Test Variability by the State of North Carolina

    Appendix G Analytical Variability in Reasonable Potential and Permit Limit Calculations

    June 30, 2000 vi

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    List of Tables

    3-1 Promulgated WET Methods Included in This Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 3-2 Quartiles (25th and 75th) and Median (50th) of the Within-Laboratory Values of

    CV for EC25 (Chronic Tests) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3-3 Quartiles (25th and 75th) and Median (50th) of the Within-Laboratory Values of

    CV for LC50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4 3-4 Quartiles (25th and 75th) and Median (50th) of the Within-Laboratory Values of

    CV for NOEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5 3-5 Estimates of Within-Laboratory and Between-Laboratory Components of Variability . . . . . . . . . 3-7 3-6 Range of Relative Variability for Endpoints of Promulgated WET Methods, Defined by

    the 10th and 90th Percentiles from the Data Set of Reference Toxicant Tests . . . . . . . . . . . . . . . . . 3-9 3-7 Number of Laboratories Having a Given Percent of Tests Exceeding the PMSD

    Upper Bound for the Sublethal Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10 5-1 Tests for Chronic Toxicity: Power and Ability To Detect a Toxic Effect

    on the Sublethal Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8 5-2 Power to Detect a 25-Percent Difference from the Control at the 90th Percentile PMSD . . . . . . . 5-8 6-1 Example of Applying the Lower Bound PMSD for the Chronic Ceriodaphnia Test

    with the Reproduction Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8

    List of Figures

    5-1 Steps to minimize WET test method variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 6-1 Paradigm that incorporates the lower and upper percent minimum significant difference . . . . . . 6-6 6-2 Implementing applications of upper and lower PMSD bounds

    for effluent testing requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7

    June 30, 2000 vii

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    This page intentionally left blank.

    June 30, 2000 viii

  • ACKNOWLEDGEMENTS

    This guidance was prepared through the cooperative efforts of the U.S. Environmental Protection Agency’s (EPA) Office

    of Wastewater Management and Office of Science and Technology in the Office of Water, EPA’s Office of Research and Development, and EPA’s Office of Enforcement and Compliance Assurance. The Cadmus Group, Inc. provided support for the final document production.

    EPA Variability Workgroup

    Debra Denton, EPA Region 9, San Francisco, CA John Fox, EPA Office of Science and Technology,

    Washington, DC Florence Fulk, EPA Office of Research and

    Development, Cincinnati, OH Kathryn Greenwald, EPA Office of Enforcement

    and Compliance Assurance, Washington, DC Madonna Narvaez, EPA Region 10, Seattle, WA Teresa Norberg-King, EPA Office of Research and

    Development, Duluth, MN Laura Phillips, EPA Office of Wastewater

    Management, Washington, DC

    EPA Support Outside of the Variability Workgroup

    Gregory Currey, EPA Office of Wastewater Management, Washington, DC

    Margarete Heber, EPA Office of Wetlands, Oceans, and Watersheds, Washington, DC

    Phillip Jennings, EPA Region 6, Dallas, TX Henry Kahn, EPA Office of Science and

    Technology, Washington, DC Marion Kelly, EPA Office of Science and

    Technology, Washington, DC James Pendergast, EPA Office of Wetlands,

    Oceans, and Watersheds, Washington, DC Stephen Sweeney, EPA Office of General

    Counsel, Washington, DC William Telliard, EPA Office of Science and

    Technology, Washington, DC Robert Wood, EPA Office of Wastewater

    Management, Washington, DC Marcus Zobrist, EPA Region 2, New York, NY

    Contractor Support

    The Cadmus Group, Inc., Durham, NC

    Karalyn Colopy Susan Conbere Blanche Dean Penelope Kellar

    DynCorp Information and Enterprise Technology, Inc., Alexandria, VA

    Robert Brent

    Science Applications International Corporation, Columbia, MD

    Sidina Dedah Ruth Much Kathleen Stralka

    Assistance from States

    State Case Example and Chapter 4

    EPA especially thanks Larry Ausley and Matt Matthews of the North Carolina Department of Environment and Natural Resources, Division of Water Quality for preparing Chapter 4 (Variability in Context) and Appendix F (Improvements in Minimizing WET Test Variability by the State of North Carolina).

    State-Specific WET Program Implementation

    EPA appreciates the assistance and support from the following States in providing their State-specific approaches to the WET program implementation used for Appendix E (Examples of Selected State WET Implementation Programs) of this document.

    Betty Jane Boros-Russo, State of New Jersey Kari Fleming, State of Wisconsin Randall Marshall, State of Washington Matt Matthews, State of North Carolina Charlie Roth, State of Kentucky

    Additional Assistance

    We also thank the EPA Regions, States, and laboratories that provided the WET toxicity data used to develop this guidance document.

    EPA Peer Review

    Peer review was conducted following the EPA’s Science Policy Council Handbook for Peer Review (January 1998). The anonymous review comments are gratefully acknowledged, and the changes were incorporated, as appropriate.

    June 30, 2000 ix

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    This page intentionally left blank.

    June 30, 2000 x

  • EXECUTIVE SUMMARY

    Background

    The Federal Water Pollution Control Act, commonly known as the Clean Water Act, was enacted in 1972 with the objective of “restoring the chemical, physical, and biological integrity of the Nation’s waters.” Among the U.S. Environmental Protection Agency’s (EPA’s) efforts toward this objective is the National Pollutant Discharge Elimination System (NPDES) program. This program is designed to control toxic discharges, implement water quality standards, and restore waters to “fishable and swimmable” conditions. Point sources that discharge pollutants must do so under the terms and conditions of an NPDES permit. One approach EPA employs to control toxic pollutants under the NPDES permits program is using whole effluent toxicity (WET) controls.

    EPA is issuing this document to both address questions raised on WET test method variability and to satisfy a requirement of a July 1998 settlement agreement with litigants for the Western Coalition of Arid States (WestCAS) and Edison Electric Institute et al. This document was developed by an EPA workgroup consisting of EPA’s Office of Water’s (OW) Headquarters, Office of Enforcement and Compliance Assurance, Office of Research and Development, and Regional staff. The document was externally peer reviewed in accordance with EPA’s peer review guidelines. The document addresses WET test method variability by identifying the potential sources of variance associated with WET testing, discusses how to minimize it and, finally, describes how to address it within the NPDES permitting program. The document cites both Agency and external ongoing research on this topic and scientific findings, particularly technical information that support efforts to minimize WET test result variability.

    While the document provides recommendations on how to reduce or minimize WET test variability, the document does not supersede current Agency guidance, policy, or regulation, including EPA’s promulgated test methods (40 CFR Part 136), which remain in effect. EPA expects that implementation of the NPDES program and NPDES permits will continue to comply with regulatory requirements and follow applicable EPA guidance and policy.

    Why WET Testing?

    Whole effluent toxicity is the aggregate toxic effect of an aqueous sample (e.g., effluent, receiving water) measured directly by an aquatic toxicity test. Aquatic toxicity tests are laboratory experiments that measure the biological effect (e.g., growth, survival, and reproduction) of effluents or receiving waters on aquatic organisms. In aquatic toxicity tests, organisms of a particular species are held in test chambers and exposed to different concentrations of an aqueous sample, for example, a reference toxicant, an effluent, or a receiving water, and observations are made at predetermined exposure periods. At the end of the test, the responses of test organisms are used to estimate the effects of the toxicant or effluent.

    Whole effluent toxicity test results are an integral tool in the assessment of water quality. For the protection of aquatic life, the integrated strategy includes the use of three control approaches: the chemical-specific control approach, the WET control approach, and the biological criteria/bioassessment/bioassay approach. The primary advantage of using WET controls over individual, chemical-specific controls is that WET integrates the effects of all chemical(s) in the aqueous sample. Reliance solely on chemical-specific numeric criteria or biological criteria would result in only a partially effective State toxics control program. These toxicity tests therefore must be performed using best laboratory practices, and every effort must be made to enhance repeatability of the test method. This document presents EPA’s approaches to achieve the goals listed below.

    June 30, 2000 xi

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    Effect of This Guidance

    This document clarifies several issues regarding WET variability and reaffirms EPA’s guidance in the Technical Support Document for Water Quality-Based Toxics Control (TSD, USEPA 1991a). This document provides NPDES regulatory authorities and all stakeholders, including permittees, with guidance and recommendations on how to address WET variability. EPA’s recommendations and conclusions are detailed in Chapter 7, and Appendix C provides sample NPDES permit language reflecting these recommendations.

    The most significant recommendation is to use and report the values for the percent minimum significant difference (PMSD) with all WET data results. The minimum significant difference (MSD) represents the smallest difference between the control mean and a treatment mean that leads to the statistical rejection of the null hypothesis (i.e., no toxicity) at each concentration of the WET test dilution series. The MSD provides an indication of within-test variability and test method sensitivity. Using this information, the regulatory authority and permittees can better evaluate WET test results.

    This document makes several other recommendations, such as continue to use the TSD statistical approach without adjusting for test method variability, obtain sufficient representative effluent samples, verify effluent toxicity data against reference toxicant data, maintain clear communication between the regulatory authority and permittee, and maintain good laboratory checks and certification programs.

    Three Goals of This Document

    This document describes three goals EPA has defined to address issues surrounding WET variability. In addition, the document is intended to satisfy the requirements of a settlement agreement to resolve litigation over rulemaking to standardize WET testing procedures.

    1. Quantify the variability of promulgated test methods and report a coefficient of variation (CV) as a measure of test method variability (see Chapter 3 and Appendix A).

    2. Evaluate the statistical methods described in the Technical Support Document for Water Quality-Based Toxics Control (TSD) for determining the need for and deriving WET permit conditions (see Chapter 6 and Appendix G).

    3. Suggest guidance for regulatory authorities on approaches to address and minimize test method variability (Chapter 6). In addition, the document is intended to provide guidance to regulatory authorities, permittees, and testing laboratories on conducting the biological and statistical methods and evaluating test effect concentrations (Chapter 5).

    Data Evaluated

    EPA assembled a comprehensive data base to examine variability in the WET test methods from the EPA Regions, several States, and private laboratories, which represent a widespread sampling of typical laboratories and laboratory practices. EPA applied several criteria to the data before they were accepted, including detailed sample information, strict adherence to published EPA WET test methods, and test acceptability criteria (TAC). The resulting data base contains data from 75 laboratories for 23 methods for tests concluded between 1988 and 1999.

    Approach Taken To Evaluate Test Method Variability

    The variability that EPA is assessing is associated with replicate tests using reference toxicants and WET testing methods within analytical laboratories. The focus of this guidance is not to quantify test variability between laboratories or to quantify the total variability of WET tests conducted on effluents. Rather, the purpose is to quantify method variability within laboratories (repeatability) to enable NPDES

    June 30, 2000 xii

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    programs to distinguish between variability caused by the testing method and variability associated with toxicity of multiple effluent samples taken from the same facility.

    To quantify test method variability within and between laboratories using this data base, EPA examined two key parameters: (1) the effect concentrations [effect concentration (EC25), lethal concentration (LC50), no observed effect concentration (NOEC)] estimated by the test, which are used to derive WET permit limits and evaluate self-monitoring data with those limits; and (2) the minimum significant difference (MSD), which summarizes the variability of organism responses at each test concentration within an individual test. The MSD represents the smallest difference that can be distinguished between the response of the control organisms and the response of the organisms exposed to the aqueous sample. The MSD provides an indication of within-test variability and test method sensitivity.

    Principal Conclusions

    The principal conclusions of this document follow.

    Evaluation of Test Method Variability

    • Comparisons of WET method precision with method precision for analytes commonly limited in NPDES permits clearly demonstrate that the variability of the promulgated WET methods is within the range of variability experienced in other types of analyses. Several independent researchers and studies also have concluded that method performance improves when prescribed methods are followed closely by experienced analysts (Section 4.3).

    • This document provides interim CVs for promulgated WET methods in Appendix A, Tables A-1 (acute methods) and A-2 (chronic methods), pending completion of between-laboratory studies, which may affect these interim CV estimates.

    Evaluation of Approach To Incorporate Test Method Variability

    • EPA’s TSD presents guidance for developing effluent limits that appropriately protect water quality, regarding both effluent variability and analytical variability, provided that the WET criteria and waste load allocation (WLA) are derived correctly (Section 6 and Appendix G).

    • EPA’s analysis of data gathered in the development of this document indicates that the TSD approach appropriately accounts for both effluent variability and method variability. EPA does not believe a reasonable alternative approach is available to determine a factor that would discount the effects of method variability using the TSD procedures, because the approach would not ensure adequate protection of water quality (Section 6.1.1 and Appendix G).

    Development of Guidance to Regulatory Authorities

    • EPA recommends that regulatory authorities implement the statistical approach as described in the TSD to evaluate effluent for reasonable potential and to derive WET limits or monitoring triggers (Section 6.1 and Appendix G).

    • EPA recommends that regulatory authorities calculate the facility-specific CVs using point estimate techniques to determine the need for and derive a permit limit for WET, even if self-monitoring data are to be determined using hypothesis testing techniques, for example, to determine a “no effect” concentration (“NOEC”). This document describes such facility-specific calculations (Section 3.4.1 and 6.2).

    June 30, 2000 xiii

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    Additional Recommendations and Guidance

    This document also provides recommendations and guidance on minimizing variability in three specific areas in order to generate sound WET test results: (1) obtaining a representative effluent sample; (2) conducting the toxicity tests properly to generate the biological endpoints; and (3) conducting the appropriate statistical analysis to obtain defensible effect concentrations (EC25, LC50, NOEC). If these recommendations are addressed, the reliability of the test endpoint values should improve.

    • Regulatory Authorities: Design a sampling program that collects representative effluent samples to fully characterize effluent variability for a specific facility over time (Sections 6.1.3 and 6.2).

    • Regulatory Authorities: Ensure proper application of WET statistical procedures and test methods (Sections 5.2 through 5.5).

    • Regulatory Authorities: Incorporate both the upper and lower bounds using the percent minimum significant difference (PMSD) to control and to minimize within-test method variability and increase test sensitivity. To achieve the PMSD upper bound, either the replication should increase or within-test method variability should decrease, or both (Section 6.4 and Table 3-6).

    • Testing Laboratories: Encourage WET testing laboratories to maintain control charts for PMSD and the control mean and report the PMSD with all WET test results (Section 5.3.1.1).

    • Regulatory Authorities: Participate in the National Environment Laboratory Accreditation Program and routine performance audit inspections to evaluate laboratory performance (Section 5.3.1.1).

    • Regulatory Authorities: Incorporate EPA’s guidance on error rate assumption adjustments, concentration-response relationships, confidence intervals, acceptable dilution waters, how to block by parentage for the chronic Ceriodaphnia dubia test, and control of pH drift (USEPA 2000a).

    June 30, 2000 xiv

  • LIST OF ACRONYMS AND ABBREVIATIONS1

    ACR acute-to-chronic ratio

    AML average monthly limit

    ANOVA analysis of variance

    APHA-AWWA- American Public Health Association-American Water Works Association-Water WEF Environment Federation

    ASTM American Society for Testing and Materials

    BSAB Biomonitoring Science Advisory Board

    CCC criteria continuous concentration

    CFR Code of Federal Regulations

    CMC criteria maximum concentration

    CV coefficient of variation

    CWA Clean Water Act

    DMR discharge monitoring report

    EMS error mean square [also referred to as mean square error (MSE)]

    EPA U.S. Environmental Protection Agency (also, the Agency)

    FR Federal Register

    IC inhibition concentration

    IWC instream waste concentration (sometimes referred to as receiving water concentration)

    LC50 lethal concentration, 50 percent

    LOEC lowest observed effect concentration

    LTA long-term average (LTAa = acute LTA; LTAc = chronic LTA; LTAa,c = acute-to-chronic LTA)

    MDL maximum daily limit

    MSD minimum significant difference

    MSE mean square error [also referred to as error mean square (EMS)]

    MZ mixing zone

    NELAP National Environment Laboratory Accreditation Program

    NOEC no observed effect concentration

    NPDES National Pollutant Discharge Elimination System

    NTRD National Toxicant Reference Database

    PAI Performance Audit Inspections

    PMSD percent minimum significant difference

    Note: These acronyms and abbreviations may have other meanings in other EPA programs or documents.

    June 30, 2000

    1

    xv

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    QA quality assurance

    QC quality control

    rMSE square root of the mean square error

    RP reasonable potential

    RWC receiving water concentration (sometimes referred to as instream waste concentration)

    SCTAG Southern California Toxicity Assessment Group

    SETAC Society of Environmental Toxicology and Chemistry

    TAC test acceptability criteria

    TIE toxicity identification evaluation

    TMDL total maximum daily load

    TRE toxicity reduction evaluation

    TSD EPA’s Technical Support Document for Water Quality-based Toxics Control (March 1991, EPA505/2-90-001)

    TU toxic unit (TUa = acute toxicity; TUc = chronic toxicity)

    VF variability factor

    WET whole effluent toxicity

    WLA waste load allocation

    WQBEL water quality based effluent limit

    June 30, 2000 xvi

  • GLOSSARY

    Acute Toxicity Test is a test to determine the concentration of effluent or ambient waters that causes an adverse effect (usually death) on a group of test organisms during a short-term exposure (e.g., 24, 48, or 96 hours). Acute toxicity is measured using statistical procedures (e.g., point estimate techniques or a t-test).

    Acute-to-Chronic Ratio (ACR) is the ratio of the acute toxicity of an effluent or a toxicant to its chronic toxicity. It is used as a factor for estimating chronic toxicity on the basis of acute toxicity data, or for estimating acute toxicity on the basis of chronic toxicity data.

    Ambient Toxicity is measured by a toxicity test on a sample collected from a receiving waterbody.

    ANOVA is analysis of variance.

    Average Monthly Limit (AML) is the calculated average monthly limit of waste load allocation assigned by a State or EPA for a particular facility.

    CCC are water quality criteria for chronic exposure (criteria continuous concentrations).

    Chronic Toxicity Test is a short-term test in which sublethal effects (e.g., reduced growth or reproduction) are usually measured in addition to lethality. Chronic toxicity is defined as TUc = 100/NOEC or TUc = 100/ECp or ICp.

    CMC are water quality criteria for acute exposures (criteria maximum concentration).

    Coefficient of Variation (CV) is a standard statistical measure of the relative variation of a distribution or set of data, defined as the standard deviation divided by the mean. It is also called the relative standard deviation (RSD). The CV can be used as a measure of precision within (within-laboratory) and between (between-laboratory) laboratories, or among replicates for each treatment concentration.

    Confidence Interval is the numerical interval constructed around a point estimate of a population parameter.

    Effect Concentration (EC) is a point estimate of the toxicant concentration that would cause an observable adverse effect (e.g., death, immobilization, or serious incapacitation) in a given percent of the test organisms, calculated from a continuous model (e.g., Probit Model). EC25 is a point estimate of the toxicant concentration that would cause an observable adverse effect in 25 percent of the test organisms.

    Hypothesis Testing is a statistical technique (e.g., Dunnett’s test) for determining whether a tested concentration is statistically different from the control. Endpoints determined from hypothesis testing are NOEC and LOEC. The two hypotheses commonly tested in WET are:

    Null hypothesis (Ho): The effluent is not toxic. Alternative hypothesis (Ha): The effluent is toxic.

    Inhibition Concentration (IC) is a point estimate of the toxicant concentration that would cause a given percent reduction in a non-lethal biological measurement (e.g., reproduction or growth), calculated from a continuous model (i.e., Interpolation Method). IC25 is a point estimate of the toxicant concentration that would cause a 25-percent reduction in a non-lethal biological measurement.

    Instream Waste Concentration (IWC) is the concentration of a toxicant in the receiving water after mixing. The IWC is the inverse of the dilution factor. It is sometimes referred to as the receiving water concentration (RWC).

    June 30, 2000 xvii

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    LC50 (lethal concentration, 50 percent) is the toxicant or effluent concentration that would cause death in 50 percent of the test organisms.

    Lowest Observed Effect Concentration (LOEC) is the lowest concentration of an effluent or toxicant that results in adverse effects on the test organisms (i.e., where the values for the observed endpoints are statistically different from the control).

    Long-term Averages (LTAs) of pollutant concentration or effluent toxicity are calculated from waste load allocations (WLAs), typically assuming that the WLA is a 99th percentile value (or another upper bound value) based on the lognormal distribution. One LTA is calculated for each WLA (typically an acute LTA and a chronic LTA for aquatic life protection). The LTA represents expected long-term average performance from the permitted facility required to achieve the associated WLA.

    Maximum Daily Limit (MDL) is the calculated maximum WLA assigned by a State or EPA for a particular facility.

    Minimum Significant Difference (MSD) is the magnitude of difference from control where the null hypothesis is rejected in a statistical test comparing a treatment with a control. MSD is based on the number of replicates, control performance, and power of the test.

    Mean Square Error (MSE) is the average dispersion of the items around the treatment means. It is an estimate of a common variance, the within variation, or variation among observations treated alike. [Also referred to as error mean square (EMS).]

    Mixing Zone is an area where an effluent discharge undergoes initial dilution and is extended to cover the secondary mixing in the ambient waterbody. A mixing zone is an allocated impact zone where water quality criteria can be exceeded as long as acutely toxic conditions are prevented.

    No Observed Effect Concentration (NOEC) is the highest tested concentration of an effluent or toxicant that causes no observable adverse effect on the test organisms (i.e., the highest concentration of toxicant at which the values for the observed responses are not statistically different from the controls).

    National Pollutant Discharge Elimination System (NPDES) program regulates discharges to the nation’s waters. Discharge permits issued under the NPDES program are required by EPA regulation to contain, where necessary, effluent limits based on water quality criteria for the protection of aquatic life and human health.

    Power is the probability of correctly detecting an actual toxic effect (i.e., declaring an effluent toxic when, in fact, it is toxic).

    Precision is a measure of reproducibility within a data set. Precision can be measured both within a laboratory (within-laboratory) and between laboratories (between-laboratory) using the same test method and toxicant.

    Quality Assurance (QA) is a practice in toxicity testing that addresses all activities affecting the quality of the final effluent toxicity data. QA includes practices such as effluent sampling and handling, source and condition of test organisms, equipment condition, test conditions, instrument calibration, replication, use of reference toxicants, recordkeeping, and data evaluation.

    Quality Control (QC) is the set of more focused, routine, day-to-day activities carried out as part of the overall QA program.

    Reasonable Potential (RP) is where an effluent is projected or calculated to cause an excursion above a water quality standard based on a number of factors.

    xviii June 30, 2000

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    Reference Toxicant Test is a check of the sensitivity of the test organisms and the suitability of the test methodology. Reference toxicant data are part of a routine QA/QC program to evaluate the performance of laboratory personnel and the robustness and sensitivity of the test organisms.

    Significant Difference is defined as a statistically significant difference (e.g., 95 percent confidence level) in the means of two distributions of sampling results.

    Statistic is a computed or estimated quantity such as the mean, standard deviation, or coefficient of variation.

    Test Acceptability Criteria (TAC) are specific criteria for determining whether toxicity test results are acceptable. The effluent and reference toxicant must meet specific criteria as defined in the test method (e.g., for the Ceriodaphnia dubia survival and reproduction test, the criteria are as follows: the test must achieve at least 80 percent survival and an average of 15 young per surviving female in the controls).

    Total Maximum Daily Load (TMDL) is a determination of the amount of a pollutant, or property of a pollutant, from point, nonpoint, and natural background sources, including a margin of safety, that may be discharged to a water quality-limited waterbody.

    t-Test (formally Student’s t-Test) is a statistical analysis comparing two sets of replicate observations, in the case of WET, only two test concentrations (e.g., a control and 100 percent effluent). The purpose of this test is to determine if the means of the two sets of observations are different [e.g., if the 100-percent effluent concentration differs from the control (i.e., the test passes or fails)].

    Type I Error (alpha) is the rejection of the null hypothesis (Ho) when it is, in fact, true (i.e., determining that the effluent is toxic when the effluent is not toxic).

    Type II Error (beta) is the acceptance of the null hypothesis (Ho) when it is not true (i.e., determining that the effluent is not toxic when the effluent is toxic).

    Toxicity Test is a procedure to determine the toxicity of a chemical or an effluent using living organisms. A toxicity test measures the degree of effect of a specific chemical or effluent on exposed test organisms.

    Toxic Unit-Acute (TUa) is the reciprocal of the effluent concentration (i.e., TUa = 100/LC50) that causes 50 percent of the organisms to die by the end of an acute toxicity test.

    Toxic Unit-Chronic (TUc) is the reciprocal of the effluent concentration (e.g., TUc = 100/NOEC) that causes no observable effect (NOEC) on the test organisms by the end of a chronic toxicity test.

    Toxic Unit (TU) is a measure of toxicity in an effluent as determined by the acute toxicity units (TUa) or chronic toxicity units (TUc) measured. Higher TUs indicate greater toxicity.

    Toxicity Identification Evaluation (TIE) is a set of procedures used to identify the specific chemicals causing effluent toxicity.

    Toxicity Reduction Evaluation (TRE) is a site-specific study conducted in a step-wise process designed to identify the causative agents of effluent toxicity, isolate the source of toxicity, evaluate the effectiveness of toxicity control options, and then confirm the reduction in effluent toxicity.

    Variance is a measure of the dispersion in a set of values, defined as the sum of the squared deviations divided by their total number.

    Whole Effluent Toxicity (WET) is the total toxic effect of an effluent measured directly with a toxicity test.

    Waste Load Allocation (WLA) is the portion of a receiving water’s total maximum daily load that is allocated to one of its existing or future point sources of pollution.

    June 30, 2000 xix

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    This page intentionally left blank.

    June 30, 2000 xx

  • 1.0 INTRODUCTION

    1.1 Background

    The Federal Water Pollution Control Act, commonly known as the Clean Water Act (CWA), was enacted in 1972 with the objective of “restoring the chemical, physical, and biological integrity of the Nation’s waters.” Several goals and policies were established in the Act, including the following:

    • Eliminating the discharge of pollutants into navigable waters by 1985;

    • Wherever attainable, achieving an interim goal of water quality that provides for the protection and propagation of fish, shellfish, and wildlife, and provides for recreation in and on the water by November 1, 1983; and

    • Prohibiting the discharge of toxic pollutants in toxic amounts.

    In the 28 years since the CWA was enacted, the U.S. Environmental Protection Agency (EPA) and States authorized to administer EPA’s National Pollutant Discharge Elimination System (NPDES) permitting program have made significant progress toward achieving these goals. NPDES is designed to control toxic discharges, implement a water quality standards program, and restore waters to “fishable and swimmable” conditions. A point source that discharges pollutants to waters of the United States must do so under the terms and conditions of an NPDES permit. In setting these terms and conditions, EPA and the States have integrated their control of toxic pollutants through combined use of three approaches [Technical Support Document for Water Quality-based Toxics Control (USEPA 1991a, referred to as the TSD)]:

    • Chemical-specific controls,

    • Whole effluent toxicity (WET) controls, and

    • Biological criteria/bioassessments and bioassays.

    The WET approach to protection of water quality is the primary subject of this document.

    In 1989, EPA defined whole effluent toxicity as “the aggregate toxic effect of an effluent measured directly by an aquatic toxicity test” [54 Federal Register (FR) 23868 at 23895, June 2, 1989]. Aquatic toxicity tests are laboratory experiments that measure the biological effect (e.g., growth, survival, and reproduction) of effluents or receiving waters on aquatic organisms. In aquatic toxicity tests, groups of organisms of a particular species are held in test chambers and exposed to different concentrations of an aqueous test sample, for example, a reference toxicant, an effluent, or a receiving water. Observations are made at predetermined exposure periods. At the end of the test, the responses of test organisms are used to estimate the effects of the toxicant or effluent.

    In the early 1980s, EPA published methods (USEPA 1985, 1988, 1989) for estimating the short-term acute and chronic toxicity of effluents and receiving waters to freshwater and marine organisms. WET data gathered in the 1980s indicated that approximately 40 percent of NPDES facilities nationwide discharged an effluent with sufficient toxicity to cause water quality problems. Further reductions in the toxicity of wastewater discharges were necessary to achieve compliance with narrative water quality standards expressed as “no toxics in toxic amounts.” In response to these findings, EPA implemented a policy to reduce or eliminate toxic discharges. The Policy for the Development of Water Quality-based Permit Limitations for Toxic Pollutants (49 FR 9016, March 9, 1984) introduced EPA’s integrated toxics control program. To support this policy, EPA developed the TSD (USEPA 1991a). The TSD provides guidance to

    June 30, 2000 1-1

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    regulators in implementing WET testing requirements in NPDES permits. In 1989, EPA promulgated regulations specifying procedures for determining when water quality-based effluent limitations are required in NPDES permits [40 CFR, 122.44(d)]. On October 26, 1995, EPA promulgated WET test methods (USEPA 1993, 1994a, and 1994b) and added them to the list of EPA methods approved under Section 304(h) of the CWA (40 CFR, 136) for use in the NPDES program. Although the rulemaking was challenged in court, that challenge has been stayed pending completion of a settlement agreement. The rulemaking remains in force and effect unless and until EPA takes further action.

    1.2 Effect of This Guidance

    This document attempts to clarify several issues regarding WET variability and reaffirms EPA’s earlier guidance and recommendations published in the TSD (USEPA 1991a). This document is intended to provide NPDES regulatory authorities and all stakeholders, including permittees, with guidance and recommendations on how to understand and account for measurement variability in WET testing. The document’s recommendations and conclusions are detailed in Section 7. Appendix C provides sample NPDES permit language reflecting these recommendations.

    The most significant recommendation is to use and report the values for the percent minimum significant difference (PMSD) with all WET data results. The minimum significant difference (MSD) is the smallest difference that can be distinguished between the response of control organisms and the response of test organisms at each concentration of the WET test dilution series. The MSD provides an indication of the within-test variability and test method sensitivity. Using this information, the regulatory authority and permittees can better evaluate WET test results.

    This document also recommends the following:

    • Continue to use the EPA TSD statistical approach for NPDES permit limit development (no test method variability adjustments are needed);

    • Collect and evaluate a sufficient number of representative effluent samples;

    • Verify effluent toxicity data carefully along with reference toxicant data;

    • Maintain good communication between the regulatory authority and permittee throughout all phases of the permitting process;

    • Implement the PMSD to evaluate both WET and reference toxicant data to minimize within-test method variability and increase test sensitivity;

    • Maintain laboratory checks with good laboratory certification programs to encourage experienced laboratories and skilled analysts for the toxicity testing program for individual WET laboratory performance.

    1.3 Three Goals of This Document

    EPA prepared this document to achieve the following three goals:

    1. Quantify the variability of promulgated test methods and report a coefficient of variation (CV) as a measure of test method variability (see Chapter 3 and Appendix A).

    June 30, 2000 1-2

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    2. Evaluate the statistical methods described in the Technical Support Document for Water Quality-Based Toxics Control (TSD) for determining the need for and deriving WET permit conditions (see Chapter 6 and Appendix G).

    3. Suggest guidance for regulatory authorities on approaches to address and minimize test method variability (Chapter 6). In addition, the document is intended to provide guidance to regulatory authorities, permittees, and testing laboratories on conducting the biological and statistical methods and evaluating test effect concentrations (Chapter 5).

    This document does not address effluent variability. It does, however, discuss how handling effluent samples can affect tests. Chapter 2 provides definitions of terms used and discusses the ways in which variability can be quantified. Chapter 3 describes the variability of the effect concentration estimates (EC25, LC50, and NOEC) and the variability of endpoint measurements (survival, growth, and reproduction). Chapter 4 discusses WET variability in the context of chemical-specific method variability. Chapter 5 provides guidance to permittees, testing laboratories, and regulatory authorities to minimize test method variability. Chapter 6 provides guidance to regulatory authorities on how to determine reasonable potential (RP) and derive permit limits or monitoring triggers and evaluate self-monitoring data. Chapter 7 presents EPA’s principal conclusions. Chapter 8 is a bibliography containing a list of documents cited herein and additional reading material.

    June 30, 2000 1-3

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    This page intentionally left blank.

    June 30, 2000 1-4

  • 2.0 DEFINITION AND MEASUREMENT OF METHOD VARIABILITY

    IN WET TESTING

    The terms used to express toxicity test results are defined in this chapter, and methods for quantifying WET test method variability are discussed. Additional terms used throughout this document, along with their definitions, are provided in the Glossary as part of the front matter of this document.

    2.1 Terms and Definitions

    Biological endpoints are the biological observations recorded when conducting toxicity tests. These observations may include the number of surviving organisms or the number of young produced. There are two basic types of biological endpoints: responses recorded as response/no response (e.g., dead or alive) are quantal data; responses recorded as a measured response (e.g., weight) or as a count (e.g., number of young produced) are considered continuous data. For most WET tests, the observations for each tested concentration are combined and then reported as an average or percentage to represent the biological endpoint. For example, the fathead minnow larval survival and growth chronic test method has two biological endpoints (i.e., percent survival and average dry weight for each test concentration).

    Effect concentrations are concentrations of a test material (i.e., effluent, referent toxicant, receiving water) derived from the observed biological endpoints followed by data analysis using either hypothesis testing procedures or point estimate techniques. Effect concentrations derived using point estimation techniques represent the concentration of a test material at which a predetermined level of effect occurs. For example, LC50 is the lethal concentration at which 50 percent of the organisms respond. Effect concentrations commonly estimated for WET methods are LC50, EC50 (effect concentration at which a 50percent effect occurs), and IC25 (inhibition concentration at which a 25-percent effect occurs). Hypothesis test methods are used to determine the no observed effect concentration (NOEC). The NOEC represents the highest effect concentration in the test concentration response that is not significantly different from the control response. Multiple statistical endpoints can be derived for each WET method. For example, the endpoints for the fathead minnow larval survival and growth chronic test can be reported as an EC25 for growth, an NOEC for growth, an LC50 (or EC50) for survival, and an NOEC for survival.

    2.2 Defining WET Test Variability

    As with any measurement process, WET tests have a degree of variability associated with the test method performance. Three measures of variability related to WET tests are within-test variability, within-laboratory variability, and between-laboratory variability.

    • Within-test (intra-test) variability is the variability in test organism response within a concentration averaged across all concentrations of the test material in a single test.

    • Within-laboratory (intra-laboratory) variability is the variability that is measured when tests are conducted using specific methods under reasonably constant conditions in the same laboratory. Within-laboratory variability, as used in this document, includes within-test variability. The American Society for Testing and Materials (ASTM) uses the term “repeatability” to describe within-laboratory variability. Repeatability is estimated (as a sample variance or standard deviation) by repeating a test method under realistically constant conditions within a single laboratory.

    • Between-laboratory (inter-laboratory) variability is the variability between laboratories. It is measured by obtaining results from different laboratories using the same test method and the same

    June 30, 2000 2-1

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    test material (e.g., reference toxicant). Between-laboratory variability, as used in this document, does not include the within-laboratory component of variance. ASTM uses the term “reproducibility” to describe between-laboratory variability. Reproducibility is estimated by having nearly identical test samples (duplicates or splits) analyzed by multiple laboratories using similar standard methods. Although reproducibility is generally synonymous with between-laboratory variability, estimates of reproducibility may combine within-laboratory and between-laboratory components of variance, making between-laboratory variability numerically larger than within-laboratory variability as defined above.

    For purposes of consistency, EPA uses the terms within-laboratory and between-laboratory variability throughout this document.

    Numerous factors can affect the variability of any toxicity test method. These factors include the number of test organisms, the number of treatment replicates, randomization techniques, the source and health of the test organisms, the type of food used, laboratory environmental conditions, and dilution water quality. The experience of the analyst performing the test, analyzing the data, and interpreting the results may also affect variability (Grothe et al. 1996, Fulk 1996).

    2.3 Quantifying WET Test Variability

    Historically, information on the variability of toxicity tests has been developed using effect concentrations, such as the NOEC, EC25, EC50, and LC50 for survival, fecundity, and growth. Variability measures should be quantified based on the end use of the data (i.e., effect concentrations) and be directly related to the WET permit requirement. Typically, the effect concentrations are the endpoints used for evaluating self-monitoring results. The variability of the effect concentrations is quantified by obtaining multiple test results under similar test conditions using the same test material. For example, the sample standard deviation and mean for EC25 obtained from multiple monthly reference toxicant tests for the fathead minnow survival and growth chronic test conducted at one laboratory would quantify “withinlaboratory” variability for that laboratory. EPA used this approach to evaluate data for the development of this document (see Chapter 3).

    Examining variability for each effect concentration of each biological endpoint for each test method is essential. The biological endpoints may be different for various toxicants and effluents. One biological endpoint, such as reproduction, may be more sensitive to a certain toxicant than another endpoint, such as survival. That sensitivity may be reversed for a different toxicant. Alternatively, an endpoint may be more sensitive to one toxicant than another toxicant.

    Three other measures of variability (which are not addressed in this document) that have been applied to WET tests are:

    1. Determine the variability of the biological endpoint response. For example, the variance of the biological response (e.g., growth and survival) can be calculated. This approach is useful, but does not quantify variability of the WET test effect concentration, which is important in the context of this document.

    2. Quantify the uncertainty of each test point estimate (e.g., the EC50, EC25, or LC50) using confidence intervals, which reflect within-test variability.

    3. Use the standard deviation to quantify the uncertainty in the mean of the replicate response at each concentration within a particular test. For example, laboratories can compare the standard deviations of the average weight of fathead minnow larvae in four chronic tests at one test concentration, such as 1 mg/L sodium chloride. These standard deviations may be pooled across

    June 30, 2000 2-2

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    all the concentrations when data have been transformed (if necessary) to give similar variances at each concentration. From the pooled variance, one may calculate a minimum significant difference (MSD) value, which is a useful indication of test sensitivity (see Chapters 3 and 5). In this document, the standard deviation at each concentration was not evaluated as a measure of variability. However, the MSD was considered as a measure of WET test variability.

    June 30, 2000 2-3

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    This page intentionally left blank.

    June 30, 2000 2-4

  • 3.0 VARIABILITY OF WET TEST METHODS

    Chapter 3 describes the variability of effect concentration estimates (EC25, LC50, and NOEC) and endpoint measurements (survival, growth, and reproduction). For definitive studies of the variability of WET methods, readers should also refer to the TSD (USEPA 1991a, Part 1.3.3) and to WET methods manuals (USEPA 1993, 1994a, 1994b). EPA will complete and report on a new between-laboratory study of promulgated methods in 2000 or 2001.

    3.1 Acquisition, Selection, and Quality Assurance of Data Presented in This Document

    EPA solicited data for reference toxicant tests from laboratories that conduct WET tests and use reference toxicant testing as part of their quality control (QC) program. Reference toxicant testing is required, as specified in EPA toxicity test methods, to document laboratory performance over time for laboratories conducting self-monitoring tests. When laboratories are conducting effluent tests, at least one reference toxicant test must be conducted each month using the same toxicant, test concentrations, dilution water, and data analysis methods. These reference toxicant tests must be conducted using the same test conditions (type of dilution water, temperature, test protocol, and species) that are used for WET tests conducted by the laboratory.

    Reference toxicant tests were used to characterize method variability because, in contrast to effluent samples, fixed concentrations of known toxicants are used. Only with this standardization is it possible to conclude that variability of the effect concentration estimates is derived from the sources discussed above, rather than from changes in the toxicant.

    EPA received reference toxicant test data from several States, private laboratory sources, and the EPA Regions. Data sources used for these analyses include the EPA National Toxicant Reference Database (NTRD), the EPA Region 9 Toxicity Data Base, and laboratory bench sheets voluntarily submitted by independent sources. Although the data do not represent a random sample of laboratories or tests, they do represent a widespread sampling of typical laboratories and practices.

    EPA required that reference toxicant tests included in its data base meet the following four criteria:

    1. Test records documented the test method, organism, test date, laboratory, reference toxicant, and individual biological responses in the concentration series.

    2. Data for each replicate were provided as required in the published method using the current test method.

    3. The test used at least five toxicant concentrations and a control for the most commonly reported chronic toxicity test methods—(1) 1000.0, fathead minnow larval survival and growth; (2) 1002.0, Ceriodaphnia survival and reproduction; and (3) 1006.0, inland silverside survival and growth. For other chronic toxicity test methods, the test used at least four toxicant concentrations and a control because the methods permitted, in the recent past, the use of only four concentrations.

    4. EPA personnel or an EPA contractor calculated the effect concentration, verified that all test acceptability criteria (TAC) had been met, and verified that the statistical flowchart had been followed correctly. Thus, all summary statistics and estimates were calculated from the replicate data and strictly followed the most current EPA test methods.

    June 30, 2000 3-1

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    Details of data quality assurance and test acceptance are provided in a separate document, available at EPA’s Office of Water docket, located in the Office of Science and Technology [“Whole Effluent Toxicity (WET) Data Test Acceptance and Quality Assurance Protocol”]. An attachment to that document provides a laboratory-by-laboratory listing of quality assurance flags, test dates, and toxicant concentrations, as well as summary statistics by laboratory for the NOEC, EC25, and LC50 estimates and test endpoints (survival, growth, reproduction, etc.). Laboratories are not identified by name.

    The data set of reference toxicant tests includes information from 75 laboratories for 23 methods for tests conducted between 1988 and 1999. This document addresses, and provides specific guidance on, the variability of methods promulgated by EPA in 40 CFR Part 136 (Table 3-1). The data are also used to develop between-laboratory interim estimates of method variability for the promulgated methods (Appendix A). The Agency identifies these CVs as “interim;” EPA may revise some or all of these estimates based on between-laboratory studies to evaluate some of the promulgated test methods.

    The next section presents summary statistics for the promulgated methods. Summary statistics for all methods in the data set appear in Appendix B. For methods represented by a few laboratories, summary statistics should not be considered representative of method performance. For example, EPA’s Office of Water usually relies on acceptable data from at least six laboratories (USEPA 1996b) when it conducts a multi-laboratory study to quantify method performance. The data used here have not been obtained under conditions as rigorous as those applied to a between-laboratory study and for that reason, may overestimate variability, particularly for the extremes.

    Coefficients of variation are used as descriptive statistics for NOECs in this document. Because NOECs can take on only values that correspond to concentrations tested, the distribution (and CV) of NOECs can be influenced by the selection of experimental concentrations, as well as additional factors (e.g., within-test variability) that affect both NOECs and point estimates. This makes CVs for NOECs more uncertain than the CVs for point estimates, and the direction of this uncertainty is not uniformly toward larger or smaller CVs. Despite these confounding issues, CVs are used herein as the best available means of expressing the variability of interest in this document and for general comparisons among methods. Readers should be cautioned, however, that small differences in CVs between NOECs and point estimates may be artifactual; large differences are more likely to reflect real differences in variability (a definition of what is “small” or “large” would require a detailed statistical analysis and would depend upon the experimental and statistical details surrounding each comparison). NOECs can only be a fixed number of discrete values; the mean, standard deviation, and CV cannot be interpreted and applied as they are for a continuous variable such as the EC25 or EC50. For instance, the typical reference toxicant test might result in only three observed NOEC values, most of them at one or two concentrations. The mean will fall between tested concentrations, as will the stated confidence intervals; thus, these do not actually represent expected outcomes, only approximations of the expected outcome.

    As an alternative to CVs, ratios are used to quantify variability of EC25, EC50, and NOEC measurements in Appendix B. Ratios of measurements have been used previously to quantify and compare variability of NOEC and EC50 (Chapman et al. 1996b, Dhaliwal et al. 1997).

    3.2 Variability of EC25, LC50, and NOEC

    3.2.1 Within-Laboratory Variability of EC25, LC50, and NOEC

    This section characterizes the within-test and within-laboratory variability of effect concentration estimates. Tables 3-2 through 3-4 summarize variation across laboratories of the within-laboratory coefficients of variation (CVs), without respect to reference toxicant tested. Tables showing more extensive summaries appear in Appendix B (Tables B-1 through B-3).

    June 30, 20003-2

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    Table 3-1. Promulgated WET Methods Included in This Report

    Test Method No. Test Method

    EPA Data Base

    Toxicants Tests Labs

    Freshwater Methods for Chronic Toxicitya

    1000.0 Pimephales promelas, Fathead Minnow Larval Survival and Growth Test

    Cd, Cr, Cu, KCl, NaCl, NaPCP, SDS

    205 19

    1000.0 Pimephales promelas, Fathead Minnow Embryo-Larval Survival and Teratogenicity Test

    0 0

    1002.0 Ceriodaphnia dubia, Water Flea Survival and Reproduction Test

    Cd, Cu, KCl, NaCl, NaPCP 393 33

    1003.0 Selenastrum capricornutum, b Green Alga Growth Test

    Cu, NaCl, Zn 85 9

    Marine & Estuarine Methods for Chronic Toxicityc

    1004.0 Cyprinodon variegatus, Sheepshead Minnow Larval Survival and Growth Test

    Cd, KCl 57 5

    1005.0 Cyprinodon variegatus, Sheepshead Minnow Embryo-larval Survival and Teratogenicity Test

    0 0

    1006.0 Menidia beryllina, Inland Silverside Larval Survival and Growth Test

    Cr, Cu, KCl, SDS 193 16

    1007.0 Americamysis (Mysidopsis) bahia, Mysid Survival, Growth, and Fecundity Test

    Cr, Cu, KCl 130 10

    1008.0 Arbacia punctulata, Sea Urchin Fertilization Test 0 0

    1009.0 Champia parvula, Red Macroalga Reproduction Test

    Cu, SDS 23 2

    Methods for Acute Toxicity d,e

    2000.0 Fathead Minnow Survival Test Cd, Cu, KCl, NaCl, NaPCP 217 21

    2002.0 Ceriodaphnia dubia Survival Test Cd, Cu, KCl, NaCl, NaPCP 241 23

    2004.0 Sheepshead Minnow Survival Test SDS 65 3

    2006.0 Inland Silverside Survival Test Cd, KCl, SDS 48 5

    2007.0 Mysid (A. bahia) Survival Test Cd, Cu, SDS 32 3

    2011.0 Mysid (H. costata) Survival Test Cd, SDS 14 2

    2019.0 Rainbow Trout Survival Test Cu, Zn 10 1

    2021.0 Daphnia magna Survival Test Cd 48 5

    2022.0 Daphnia pulex Survival Test Cu, NaCl, SDS Cd, Cu, NaCl, NaPCP

    57 6

    a See publications EPA/600/4-89-001 (USEPA 1989) and EPA/600/4-91-002 (USEPA 1994b).b The genus and species names for Selenastrum capricornutum have been changed to Raphidocelis subcapitata. In this

    document, however, Selenastrum capricornutum is used to avoid confusion. See publication EPA/600/4-91-003 (USEPA 1994a) and EPA/600/4-87/028 (USEPA 1988).

    d See publications EPA/600/4-85/013 (USEPA 1985) and EPA/600/4-90/027F (USEPA 1993). e EPA did not assign method numbers for acute methods in EPA/600/4-90/027F. The numbers assigned here were created for

    use in this document and in related materials and data bases. Reference toxicant codes: Cd cadmium NaCl sodium chloride Cr chromium NaPCP sodium pentachlorophenate Cu copper SDS sodium dodecyl sulfate KCl potassium chloride Zn zinc

    June 30, 2000 3-3

    c

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    Table 3-2. Quartiles (25th and 75th) and Median (50th) of the Within-Laboratory Values of CV for EC25 (Chronic Tests)

    Test Methoda

    Test Method

    No. Endpoint

    b No. of Labs

    Percentiles of CV

    25th 50th 75th

    Fathead Minnow Larval Survival & Growth 1000.0 G 19 0.21 0.26 0.38

    Fathead Minnow Larval Survival & Growth 1000.0 S 16 0.11 0.22 0.32

    Ceriodaphnia (Cd) Survival & Reproduction 1002.0 R 33 0.17 0.27 0.45

    Ceriodaphnia (Cd) Survival & Reproduction 1002.0 S 25 0.11 0.23 0.41

    Green Alga (Selenastrum) Growth 1003.0 G 6 0.25 0.26 0.39

    Sheepshead Minnow Larval Survival & Growth 1004.0 G 5 0.09 0.13 0.14

    Sheepshead Minnow Larval Survival & Growth 1004.0 S 2 0.15 0.16 0.17

    Inland Silverside Larval Survival & Growth 1006.0 G 16 0.18 0.27 0.43

    Inland Silverside Larval Survival & Growth 1006.0 S 13 0.22 0.35 0.42

    Mysid (Ab) Survival, Growth, & Fecundity 1007.0 F 4 0.30 0.38 0.41

    Mysid (Ab) Survival, Growth, & Fecundity 1007.0 G 10 0.24 0.28 0.32

    Mysid (Ab) Survival, Growth, & Fecundity 1007.0 S 7 0.17 0.21 0.28

    Red Macroalga (Champia parvula) Reproduction 1009.0 R 2 0.58 0.58 0.59 a Cd = Ceriodaphnia dubia, Ab = Americamysis (Mysidopsis) bahia b G = growth, S = survival, R = reproduction, F = fecundity

    Table 3-3. Quartiles (25th and 75th) and Median (50th) of the Within-Laboratory Values of CV for LC50

    Test Methoda Test

    Method No. Endpoint

    b No. of Labs

    Percentiles of CV

    25th 50th 75th

    Freshwater Methods for Chronic Toxicityc

    Fathead Minnow Larval Survival & Growth 1000.0 S 19 0.15 0.23 0.31

    Ceriodaphnia (Cd) Survival & Reproduction 1002.0 S 33 0.10 0.16 0.29

    Sheepshead Minnow Larval Survival & Growth 1004.0 S 5 0.07 0.08 0.12

    Inland Silverside Larval Survival & Growth 1006.0 S 16 0.16 0.28 0.35

    Mysid (Ab) Survival, Growth, & Fecundity 1007.0 S 10 0.16 0.26 0.27

    Methods for Acute Toxicityd,e

    Fathead Minnow Larval Survival 2000.0 S 21 0.10 0.16 0.19

    Ceriodaphnia (Cd) Survival 2002.0 S 23 0.11 0.19 0.29

    Sheepshead Minnow Survival 2004.0 S 5 0.12 0.14 0.21

    Inland Silverside Larval Survival 2006.0 S 5 0.15 0.16 0.21

    Mysid (Ab) Survival 2007.0 S 3 0.17 0.25 0.26

    Mysid (Hc) Survival 2011.0 S 2 0.27 0.30 0.34

    Rainbow Trout Survival 2019.0 S 1 0.23 0.23 0.23

    Daphnia (Dm) Survival 2021.0 S 5 0.07 0.22 0.24

    Daphnia (Dp) Survival 2022.0 S 6 0.19 0.21 0.27 a Cd = Ceriodaphnia dubia, Ab = Americamysis (Mysidopsis) bahia, Hc = Holmesimysis costata, Dm = Daphnia magna,

    Dp = Daphnia pulex b S = survival

    See publications EPA/600/4-89-001 (USEPA 1989) and EPA/600/4-91-002 (USEPA 1994b).d See publications EPA/600/4-85-013 (USEPA 1985 and EPA/600/4-90/027F (USEPA 1993). e EPA did not assign method numbers for acute methods in EPA/600/4-90/027F. The numbers assigned here were created for

    use in this document and in related materials and data bases.

    June 30, 2000 3-4

    c

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    Table 3-4. Quartiles (25th and 75th) and Median (50th) of the Within-Laboratory Values of CV for NOEC

    Test Methoda

    Test Metho

    d No.

    Endpointb

    No. of Labs

    Percentiles of CV

    25th 50th 75th

    Freshwater Methods for Chronic Toxicityc

    Fathead Minnow Larval Survival & Growth 1000.0 G 19 0.22 0.37 0.53

    Fathead Minnow Larval Survival & Growth 1000.0 S 19 0.26 0.39 0.48

    Ceriodaphnia (Cd) Survival & Reproduction 1002.0 R 33 0.25 0.33 0.49

    Ceriodaphnia (Cd) Survival & Reproduction 1002.0 S 33 0.21 0.30 0.43

    Green Alga (Selenastrum) Growth 1003.0 G 9 0.40 0.46 0.56

    Marine & Estuarine Methods for Chronic Toxicityd

    Sheepshead Minnow Larval Survival & Growth 1004.0 G 5 0.34 0.40 0.44

    Sheepshead Minnow Larval Survival & Growth 1004.0 S 5 0.14 0.18 0.24

    Inland Silverside Larval Survival & Growth 1006.0 G 16 0.31 0.46 0.57

    Inland Silverside Larval Survival & Growth 1006.0 S 16 0.30 0.42 0.55

    Mysid (Ab) Survival, Growth, & Fecundity 1007.0 F 4 0.17 0.36 0.40

    Mysid (Ab) Survival, Growth, & Fecundity 1007.0 G 10 0.35 0.39 0.43

    Mysid (Ab) Survival, Growth, & Fecundity 1007.0 S 10 0.28 0.33 0.38

    Red Macroalga (Champia parvula) Reprod. 1009.0 R 2 0.85 1.00 1.16

    Methods for Acute Toxicitye,f

    Fathead Minnow Larval Survival 2000.0 S 21 0.18 0.22 0.34

    Ceriodaphnia (Cd) Survival 2002.0 S 23 0.18 0.35 0.41

    Sheepshead Minnow Survival 2004.0 S 3 0 0.31 0.33

    Inland Silverside Larval Survival 2006.0 S 5 0 0.33 0.35

    Mysid (Ab) Survival 2007.0 S 3 0.29 0.38 0.43

    Mysid (Hc) Survival 2011.0 S 2 0.21 0.26 0.31

    Rainbow Trout Survival 2019.0 S 1 0.35 0.35 0.35

    Daphnia magna (Dm) Survival 2021.0 S 5 0.09 0.36 0.47

    Daphnia pulex (Dp) Survival 2022.0 S 6 0.21 0.38 0.61 a Cd = Ceriodaphnia dubia, Ab = Americamysis (Mysidopsis) bahia, Hc = Holmesimysis costata, Dm = Daphnia

    magna, Dp = Daphnia pulex b G = growth, S = survival, R = reproduction, F = fecundity

    See publications EPA/600/4-89-001 (USEPA 1989) and EPA/600/4/4-91-002 (USEPA 1994b).d See publication EPA/600/4-91-003 (USEPA 1994a) and EPA/600/4-87/028 (USEPA 1988). e See publications EPA/600/4-85/013 (USEPA 1985) and EPA/600/4-90/027F (USEPA 1993).f EPA did not assign method numbers for acute methods in EPA/600/4-90/027F. The numbers assigned here were

    created for use in this document and in related materials and data bases.

    Effect concentrations having a p-percent effect are symbolized as ECp and may be calculated for sublethal and lethal (survival) endpoints (USEPA 1993,1994a,1994b). Effect concentrations commonly estimated for WET methods are LC50, EC50, IC25, and EC25. The symbol ECp is more general and may be used to represent an LCp, ECp, or ICp endpoint. To simplify presentation of results in this document, the term EC25 is used to represent the concentration at which a 25-percent effect has occurred for either lethal

    June 30, 2000 3-5

    c

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    or sublethal endpoints. The term LC50 is used to represent the concentration at which a 50-percent effect has occurred for lethal endpoints. The EC25 for survival is not routinely used in generating self-monitoring data and is presented here for comparison to the EC25 for sublethal endpoints (i.e., IC25). Estimates of EC25, LC50, and NOEC were calculated for this document as required in the EPA test methods (USEPA 1993, 1994a, 1994b). A CV is reported for NOEC measurements in this document. See Appendix A for further details.

    The results in Tables 3-2 through 3-4 were obtained as follows, using as an example the EC25 of the growth endpoint in Method 1000.0 (fathead minnow larval chronic test) on the first row of Table 3-2. The CV of the EC25 estimates was calculated for each laboratory. This calculation resulted in 19 CVs (one per laboratory with each laboratory tested using one toxicant). The sample percentiles were calculated for this set of 19 CVs. In Table 3-2, the column headed “50th” shows the 50th percentile (median value) of CV found across these 19 laboratories; the 50th percentile value is 0.26. In the column headed “75th,” the 75th percentile CV is reported as 0.38. When a method is represented by fewer than four laboratories, the minimum and maximum CVs are shown in the columns headed “25th” and “75th,” respectively. Note that these CVs represent within-laboratory variability, and that Tables 3-2 through 3-4 show the quartiles and median of the within-laboratory CVs. These tables thus report the typical range of within-laboratory test method variation.

    Variation across laboratories in the CV for effect concentration estimates (Tables 3-2 through 3-4) may be summarized as follows, ignoring methods represented by only one or two laboratories. [Refer to the column headed “75th” (the 75th percentile).]

    For the EC25 of the growth and reproduction endpoints in chronic toxicity tests, 75 percent of laboratories have a CV no more than 0.14 to 0.45 depending on the method (Table 3-2). For the two most commonly used methods (1000.0, fathead minnow larval chronic test; and 1002.0, Ceriodaphnia chronic test), 75 percent of the laboratories have CVs no more than 0.38 and 0.45, respectively.

    For the LC50 of the survival endpoint in chronic toxicity tests, 75 percent of laboratories have a CV no more than 0.12 to 0.35, depending on the method. For the two most commonly used methods (1000.0 and 1002.0), 75 percent of laboratories have CVs no more than 0.31 and 0.29, respectively (Table 3-3). For the LC50 in acute toxicity tests, 75 percent of laboratories have a CV no more than 0.19 to 0.29, depending on the method. For the two most commonly used methods (2000.0 and 2002.0), 75 percent of laboratories have CVs no more than 0.19 and 0.29, respectively.

    For the NOEC of growth or reproduction endpoints in chronic toxicity tests, 75 percent of laboratories have a CV no more than 0.43 to 0.57, depending on the method. For the two most commonly used methods (1000.0 and 1002.0), 75 percent of laboratories have CVs no more than 0.53 and 0.49, respectively (Table 3-4). For the NOEC of survival in chronic toxicity tests, 75 percent of laboratories have a CV no more than 0.24 to 0.55, depending on the method. For the two most commonly used methods (1000.0 and 1002.0), 75 percent of laboratories have CVs no more than 0.48 and 0.43, respectively. For the NOEC of survival in acute toxicity tests, 75 percent of laboratories have a CV no more than 0.34 to 0.61, depending on the method. For the two most commonly used acute methods (2000.0 and 2002.0), 75 percent of laboratories have CVs no more than 0.34 and 0.41, respectively.

    Appendix B discusses the range of toxicant concentrations reported as the NOEC. For chronic toxicity tests, most laboratories report the NOEC to within two to three concentration intervals, and half the laboratories report most NOECs within one to two concentration intervals for reference toxicants. For acute toxicity tests, most laboratories report NOECs at one or two concentrations. This outcome agrees with EPA’s expected performance for these methods. The normal variation of the effect concentration estimate in reference toxicant tests has been reported for some EPA WET methods (USEPA 1994a, 1994b) to be plus or minus one dilution concentration for the NOEC and less for LC50.

    June 30, 2000 3-6

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    3.2.2 Between-Laboratory Variability of EC25, LC50, and NOEC

    The data set compiled for this document provided reasonable estimates of between-laboratory variability for only a few methods. For many methods and toxicants, there were too few laboratories in the data base. Additional summaries of between-laboratory variability of WET methods are included in the TSD (USEPA 1991a, Part 1.3.3) and the WET methods manuals (USEPA 1994a, 1994b). EPA also intends to provide new data in a forthcoming EPA between-laboratory study of promulgated methods.

    Using the data set, credible estimates of between-laboratory variability could be made for a few toxicants and methods having data for six or more laboratories (Table 3-5). The statistical methods are described in Appendix B. Table 3-5 shows values of the square root of within-laboratory and between-laboratory variance components (i.e., standard deviations, F). The standard deviations and mean are expressed in units of toxicant concentration (e.g., g/L or mg/L). Between-laboratory Fb estimates the standard deviation for laboratory means of EC25, LC50, and NOEC. The “Mean” column in Table 3-5 shows the mean of the laboratory means, not the mean for all tests. Because the number of tests differed among laboratories, these two means are different. These data suggest that between-laboratory variability (Fb) is comparable to within-laboratory variability (Fw) for the methods listed in the table.

    In Table 3-5, the ratio of Fb to the mean is an estimate of the relative variability (CVb) of laboratory means around their combined mean. The ratio of Fw to the mean may approach the value of the average within-laboratory CV when the sample of laboratories is large, but to characterize within-laboratory CVs, readers should use Tables 3-2 through 3-4.

    Table 3-5. Estimates of Within-Laboratory and Between-Laboratory

    Components of Variabilitya

    Test Methodb

    Test EC Estimate Toxicant

    End-Pointc Tests Labs

    Within-lab FFW

    Between-lab FFb Mean CVw CVb

    1000.0 EC25 NaCl G 73 6 0.67 0.44 2.63 0.25 0.17

    1000.0 LC50 NaCl S 73 6 1.14 0.45 4.15 0.27 0.11

    1000.0 NOEC N Cl G 73 6 0.72 0.35 2.18 0.33 0.16

    1000.0 NOEC NaCl S 73 6 0.96 0.51 2.43 0.40 0.21

    1002.0 EC25 NaCl R 292 23 0.29 0.27 0.92 0.32 0.29

    1002.0 LC50 NaCl S 285 23 0.48 0.24 1.78 0.27 0.13

    1002.0 NOEC NaCl G 292 23 0.28 0.18 0.74 0.38 0.24

    1002.0 NOEC NaCl S 292 23 0.47 0.26 1.42 0.33 0.18

    1006.0 EC25 Cu G 130 9 45.1 52.4 97.4 0.46 0.54

    1006.0 LC50 Cu S 130 9 48.4 70.7 127.0 0.38 0.56

    1006.0 NOEC Cu G 130 9 51.8 44.4 80.1 0.65 0.55

    1006.0 NOEC Cu S 130 9 34.2 39.5 65.4 0.52 0.60

    2000.0 LC50 NaCl S 154 14 1.05 1.24 7.46 0.14 0.17

    2002.0 LC50 NaCl S 167 15 0.36 0.38 1.97 0.18 0.19 a F = within-laboratory standard deviation, Fb = between-laboratory standard deviationw

    CV = within-laboratory coefficient of variation, CVb = between-laboratory coefficient of variationwb EPA did not assign method numbers for acute methods in EPA/600/4-90/027F. The numbers assigned here

    were created for use in this document and in related materials and data bases. G = growth, S = survival, R = reproduction

    June 30, 2000 3-7

    c

  • Understanding and Accounting for Method Variability in WET Applications Under the NPDES Program

    3.3 Variability of Endpoint Measurements

    This section characterizes the within-laboratory precision of endpoint measurements (e.g., growth, reproduction, and survival). Endpoint variability in methods for chronic toxicity is characterized here using sublethal endpoints. The sublethal endpoint was designed to be more sensitive than the survival endpoint, and it incorporates the effect of mortality (i.e., it incorporates biomass). For example, for the chronic survival and growth fathead minnow larval test, the total dry weight at each replicate is divided by the original number of larvae, rather than the surviving number of larvae.

    EPA reports measures of test precision based on the control CV [(control standard deviation)/(control mean)] and the “Percent MSD” [100×MSD/(control mean)], symbolized as PMSD. Recall that MSD, the “minimum significant difference,” is calculated as [d /EMS /(2/r)], where “d” is the critical value of Dunnett’s statistic when comparing “k” treatments to a control, EMS is the error mean square from the analysis of variance of the endpoint responses, and “r” is the number of replicates at each concentration (USEPA 1993, 1994a, 1994b). These measures of test precision quantify within-test variability, or the sensitivity of each test to toxic effects on the biological endpoint.

    Measures of variability relative to the control mean are used for two reasons. First, a laboratory having consistently large mean endpoint values for the control will also tend to have larger values of MSD and control standard deviation. Second, PMSD is readily interpreted as the minimum percent difference between control and treatment that can be declared statistically significant in a WET test. A significant effect occurs when (control mean - treatment mean) exceeds the MSD. Dividing by the control mean and multiplying by 100 states this relationship in terms of the percent difference between control and treatment.

    To characterize the distribution of values of PMSD, values from all laboratories and toxicants for a given method and endpoint were combined, and sample percentiles reported. Percentiles are also reported for the CV of the control, which also indicates variability among replicates under non-toxic conditions and may be a useful indicator of uniformity of the test organisms. The sample percentiles are reported in more detail in Appendix B; the 10th and 90th perc