INTRODUCTION TO THE DATA QUALITY OBJECTIVES PROCESS.
-
Upload
charla-stephens -
Category
Documents
-
view
227 -
download
0
Transcript of INTRODUCTION TO THE DATA QUALITY OBJECTIVES PROCESS.
INTRODUCTION TO THE DATA QUALITY
OBJECTIVES PROCESS
Course Objectives
At the conclusion of this course, participants will understand:
• The Agency's Quality System and the elements of the DQO Process
• How the DQO process applies to EPA programs
• How to interpret the consequences of potential decision errors.
Systematic Planning
• Agency policy requires the use of a systematic planning process to develop performance criteria
• DQO Process defines performance and acceptance criteria for decision making
• EPA recommends the DQO Process
What is the DQO Process?
The DQO Process is a systematic
planning process for generating
environmental data that will be sufficient
for their intended use.
What are DQOs?
DQOs are quantitative and qualitative criteria that:
• Clarify study objectives
• Define appropriate types of data to collect
• Specify the tolerable levels of potential decision errors
DQO Process
• Planning Tool for Managing Decision Errors
• Improves: Planning Effectiveness Design Efficiency Defensibility of results/decisions
• Generates appropriate data Type Quality Quantity
DQO Process
Designed to answer:
• What do you need?
• Why do you need it?
• How will you use it?
• What is your tolerance for errors?
DQO Process: Underlying Principles
1. All collected data have error.
2. Nobody can afford absolute certainty.
3. The DQO Process defines tolerable error rates.
4. Absent DQOs, decisions are uninformed.
5. Uninformed decisions tend to be conservative and expensive.
DQOs Strike a Balance
DQOs
Decreasing
Increasing
TimeResources
Uncertainty
Decreasing
Increasing
DQOs in the Context of the Project Life Cycle
Planning
Implementation
Assessment
Make the Decision
Conduct Data QualityAssessment
Plan for Data CollectionUsing the DQO Process
Collect Environmental Data Using Documented Sampling
Schemes
EPA QA/G-4
EPA QA/G-5
EPA QA/G-9
The DQO Process
Problem(Investigation or Study)
Resource Effective DataCollection Design
1. State the Problem.
2. Identify the Decision.
3. Identify the Inputs to the Decision.
4. Define the Boundaries of the Study.
5. Develop a Decision Rule.
6. Specify Tolerable Limits on Decision Errors.
7. Optimize the Design.
Repeated Application of the DQO Process
ITERATEAS
NEEDED
STARTDEVELOPING
DQOs
PRIMARYSTUDY
DECISION
STATETHE
PROBLEM
OPTIMIZETHE
DESIGN
IDENTIFY THE
DECISION
DEVELOP A
DECISION RULE
DEFINETHE
STUDYBOUNDARIES
IDENTIFYINPUTS TO THE
DECISION
INCREASING LEVEL OF EVALUATION EFFORT
STATETHE
PROBLEM
OPTIMIZETHE
DESIGN
IDENTIFY THE
DECISIONDEVELOP ADECISION RULE DEFINE
THE STUDY
BOUNDARIES
IDENTIFYINPUTS TO THE
DECISION
SPECIFYLIMITS
ONDECISION ERRORS
STATETHE
PROBLEM
OPTIMIZETHE
DESIGN
IDENTIFY THE
DECISION
DEVELOP A
DECISION RULE
DEFINETHE
STUDYBOUNDARIES
IDENTIFYINPUTS TO THE
DECISION
STUDY PLANNING
COMPLETED
STUDY PLANNING
COMPLETED
SPECIFYLIMITS
ONDECISION ERRORS SPECIFY
LIMITS ON
DECISION ERRORS
STUDY PLANNING
COMPLETED
INTER-MEDIATE-
STUDYDECISION
ADVANCEDSTUDY
DECISIONDECIDE NOT TO USEPROBA-BILISTICSAMPLINGAPPROACH
Data Quality Objectives:Outputs from Each Step of the Process
Problem:
Decision:
Inputs:
Boundaries:
Decision Rule:
Limits on Decision Errors:
DQOs
The DQO Process PromotesCommunication
Parameter:- Mean, percentileRisk:- CarcinogenMedia:- Soil, WaterVariance:- Variability in dataSample:- Analytical portion
Parameter:- Limits of studyRisk:- Poor decisionMedia:- Press, TVVariance:- Exception to a ruleSample:- Collection of items
PARAMETERRISK
MEDIAVARIANCESAMPLE
Decision Maker Data Collector
DQO
A Quality Planning Model
Effective Communication
DQOs
(Environmental Data)
Needs
Understanding
Approval
DECISION MAKER(Data User)
DATACOLLECTOR
PERFORMANCE SPECIFICATIONS
The DQO Process EncouragesEfficient Planning
• Clearly stated objectives
• A framework for organizing complex issues
• Limits on decision errors specified
• Efficient resource expenditure
DATA QUALITY OBJECTIVES
Seven Steps of the DQO Process
1. State the problem to be resolved.
2. Identify the decision to be made.
3. Identify the inputs to the decision.
4. Define the boundaries of the study.
5. Develop a decision rule.
6. Specify the tolerable limits on decision errors.
7. Optimize the design for obtaining the data.
Stating the Problem
• Risk Assessor• Scientist/Engineer• Statistician/Data Analyst
• Data User/Decision Maker• Lab and Field Personnel• QA Specialist
Who should participate on the planning team?
What is the problem?
What resources are available?
What time is available?
What important social/political issues have an impact on the decision?
Wood Preserving Site:Background
• U.S. State - led investigation of possible soil contamination problem
• Creosoting of timbers
• Soil contaminated with creosote
• Contains Polyaromatic Hydrocarbons (PAHs)
• Early Sampling Results:–Soil PAH concentration in low activity area 0-80 mg/kg–Soil PAH concentration in high activity area 80-140 mg/kg–Off site: Not detected–Future land use will be residential
Wood Preserving Site:Background
The Team:
• Decision Maker
• Chemist
• Field Sampling Technician
• QA Specialist
• Risk Assessor/Toxicologist
• Environmental Scientist with Statistical Training
Wood Preserving Site:Problem Statement
The Problem: Obvious creosote contamination in
the soil may pose a danger to
human health or the environment.
Information is necessary to
determine the extent of danger.
Resources: Measurement Budget = $100,000
Time Limit: Remediate in 1 year
Socio-political: Future land use is residential
Identifying the Decision
• Identify the principal study question. Clarify the main issue to be resolved.
• Specify the alternative actions that would result from each resolution.
Associate a course of action with each possible answer.
• Define the decision statement that must be resolved to address the problem.
Combine the principal study question and the alternative actions into a specific decision statement.
Wood Preserving Site:Identifying the Decision
Study Question:– Does creosote contamination in the soil pose an
unacceptable danger to human health or the environment?
Alternative Actions:– Remediate the soil– Do not remediate the soil (no action)
Decision Statement:– Determine whether the creosote contamination in
soil poses a danger that requires remediation.
Identifying Inputs for the Decision
• Focus on what information is needed for the decision.
• Identify the variables/characteristics to be measured.
• Identify the information needed to establish the action level.
Wood Preserving Site:Inputs Needed for Decision
Variable of Interest: PAHs Some PAHs are carcinogens that
are dangerous to human health.
Action Level: Set by a toxicologist using
relevant site-specific exposure
assessment at 50 ppm.
Defining the Boundaries
•Define the spatial boundary for the decision Define the geographical area within which decisions
apply Define the media of concern Divide each medium into homogeneous strata
•Define the temporal boundary of the decision Determine the time frame to which the study results
apply Determine when to study
•Define a scale of decision making
• Identify practical constraints on data collection
Wood Preserving Site:Spatial Boundaries
• Define the geographical area within which decisions apply:The property boundary (No PAHs detected off site)
• Specify the characteristics that define the population of interest:PAHs in surface soil to 15 cm depth
• Divide each medium into homogeneous strata:The site has been divided into two areas:1) Area of high activity where the concentration is
expected to be high 2) Area of low activity where the concentration is expected
to be low
Wood Preserving Site:Temporal Boundaries
• Determine the time frame to which the study results apply:
The results will represent future conditions at the site. (Future lifetime exposure for residents)
• Determine when data should be collected:
Sampling begins in 3 months. Remediation completed within 1 year. Sampling results will not vary depending on weather conditions
Wood Preserving Site:Defining the Boundaries
Scale of Decision Making:
• Decisions will be made for each residential lot-sized area (based on future land use)
Practical Constraints:
• Existing structures and debris may limit sampling locations
Develop a Decision Rule
Develop an "if/then" statement that incorporates:
• The population parameter of interest
(e.g., mean, maximum, percentile)
• The scale of decision making
(e.g., residential lot size)
• The action-triggering value
• The alternative actions
Wood Preserving Site:Decision Rule
Use average (mean) PAH concentrations to identify lots that pose a health threat.
– If the true mean PAH concentration within a residential lot is greater than 50 mg/kg, then the soil will be remediated.
– If not, then the soil will be left in situ.
Specify Limits on Decision Errors
• Determine the possible range of the parameter of interest
• Determine baseline condition (null hypothesis)
• Determine consequences of each decision error. Consequences may include:
Health risks Ecological risks Political risks Social risks Resource risks
Specifying Limits on Decision Error
• Specify the gray region - a range of possible parameter values where the consequences of decision errors are relatively minor (too close to call)
– Bounded on one side by the action level
– Bounded on the other side by the parameter value where the consequences of making a decision error begins to be significant
• Set quantitative limits on false rejection and false acceptance errors by considering the consequences of these potential decision errors.
Statistical Error Types
• Rejecting the baseline condition when it is true is a False Rejection error, F(r).
Decision: Not hazardous when it actually is hazardous
• Accepting the baseline condition when it is false is a False Acceptance error, F(a).
Decision: Hazardous when it actually is not hazardous
Decision Errors:Synonyms and Plain English
If the baseline assumption is that the program or site is in compliance, then:
False Rejection Error F(r), Type I Error, False Positive•Deciding program or site not in compliance when it is•An overreaction to a situation•Wasted resources, unnecessary expenditure
False Acceptance Error F(a), Type II Error, False Negative•Deciding program or site is in compliance when it is
not•A missed opportunity for correction•Allowing a hazard to public health or the ecosystem
False Rejection and False Acceptance
Baseline Condition: True mean level equal or below standard
Alternative: True mean level above standard
CorrectFalse
AcceptanceF(a)
FalseRejection
F(r)
Correct
In Actuality
Decisionbased on a sample
Below Standard
AboveStandard
Below Standard
AboveStandard
Decision Errors:Synonyms and Plain English
If the baseline assumption is that the program or site is NOT in compliance, then:
False Rejection Error F(r), Type I Error, False Positive• Deciding program or site is in compliance when it is not• A missed opportunity for correction• Allowing a hazard to public health or the ecosystem
False Acceptance Error F(a), Type II Error, False Negative• Deciding program or site not in compliance when it is• An overreaction to a situation• Wasted resources, unnecessary expenditure
The Probability of Making FalseRejection Decision Errors
If the true mean is much greater than the action level, few low readings will occur. So, there is a small chance of reaching a wrong conclusion.
If the true mean is close to the action level, many low readings will occur. Erroneous conclusions are much more likely.
100 ppmTrue Mean
50 ppmAction Level
75 ppmAction Level
100 ppmTrue Mean
Specify Limits on Decision Error(Construct a "What If" Table)
Assign probability values to points above and below the action level that reflect the tolerable probabilities for decision errors.
MeasuredConc. (ppm) Decision
TrueConc. (ppm)
ErrorType Aversion
Tolerable*Probability
>50 Cleanup 0-20False
rejection Severe 10%
>50 Cleanup 20-35 Falserejection
Moderate 20%
>50 Cleanup 35-50 Falserejection
Minor 30%
<50 Leave 50-100 Falseaccept
Minor Gray Region
<50 Leave 100-150 Falseaccept
Moderate 20%
<50 Leave 150-200 Falseaccept
Severe 10%
<50 Leave 200-250 Falseaccept
Very Severe 5%
* Probabilities are based on planning team discussions.
Decision Performance Goal Diagram
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Tole
rabl
e C
han
ce o
f Dec
idin
g th
at t
heP
ara
me
ter
Exc
eeds
the
Act
ion
Le
vel
True Concentration of PAH (mg/kg)
Gray Region(Too close to call)
Tolerable False Rejection
DecisionError Rates
Tolerable False
AcceptanceDecision
Error Rates
ActionLevel
Baseline Condition: Mean < 50
Optimize the Design
• Develop general data collection design alternatives– Simple random sampling– Simple random sampling with compositing– Stratified random sampling
• For each design, develop cost formula, select a proposed method of data analysis, develop method for estimating sample size to correspond to method for data analysis
• Select the most resource-effective design– consider cost, human resources, other constraints– consider performance of design
Decision Performance Goal Diagramwith Performance Curve
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Tole
rabl
e C
han
ce o
f Dec
idin
g th
at t
heP
ara
me
ter
Exc
eeds
the
Act
ion
Le
vel
True Concentration of PAH (mg/kg)
Gray Region(Too close to call)
Tolerable False Rejection
DecisionError Rates
Tolerable False
AcceptanceDecision
Error Rates
ActionLevel
Baseline Condition: Mean < 50
DQO Process Output
• Qualitative and Quantitative Framework for a study
• Feeds directly into the Quality Assurance Project Plan which is mandatory for EPA environmental data collection activities
DATA QUALITY OBJECTIVES:Cadmium Contaminated Fly Ash
Example
Case Study Introduction
Case study - Cadmium contaminated fly ash waste
• Output from a DQO case study
• Shows how the steps of the DQO process aid in developing a sampling design
• Illustrates decisions that could be made within the Resource Conservation and Recovery Act (RCRA) Program
• Not intended to represent the policies of the RCRA Program
Cadmium Contaminated Fly Ash Waste:Background Information
• Municipal incinerator
• Fly ash dumped in municipal landfill
• Company calls ash "Non-hazardous"
Background Information
New waste stream:
• Contains cadmium
− Toxic effects: inhalation and ingestion exposure
− Short term and chronic effects
• The new ash will be tested using Toxicity Characteristic Leaching Procedure (TCLP).
• Waste will be classified as hazardous if the cadmium concentration in the TCLP > 1mg/liter.
Background Information
• Pilot study - to determine the variability of the cadmium concentration in ash
• Results:
– Relatively constant variability within containers
– Relatively high variability between containers
The DQO Process:State the Problem
• Members of Planning Team–Plant Manager - Chemist–Plant Engineer Manager - Quality Assurance–Statistician/Data Analyst
• The Problem–To determine which loads of ash should be sent to a RCRA
facility and which can be dumped in the municipal landfill
• Available resources–The difference in cost between municipal and RCRA disposal
is $6750.
• Project constraints–Cost (Budget approximately $3,000 for sampling)
The DQO Process:Identify the Decision
• Define the alternative actions.– The waste fly ash could be disposed of in a RCRA
landfill.– The waste fly ash could be disposed of in a municipal
landfill.
• Form alternatives into a decision statement.– Determine if the cadmium concentration in the TCLP
leachate exceeds RCRA regulatory standards.
The DQO Process:Identify the Inputs to the Decision
• Identify key information.– Concentration of cadmium in fly ash– Fly ash samples subjected to the TCLP test and
analyzed for cadmium
• Identify information to establish the Action Level.– RCRA standard (1.0 mg/l using the TCLP method)
• Confirm that appropriate analytical methods exist.– Cadmium is a metal that has a detection limit well
below the RCRA standard.
The DQO Process:Define the Boundaries of the Study
• Identify the spatial boundaries.
– Fly ash in containerized bins; at least 70% capacity
• Identify temporal boundaries.
– The ash does not present an exposure hazard and will not degrade; no sampling time constraints are necessary.
• Define the scale of decision making.
– A decision will be made about each container.
• Identify practical considerations that may interfere with the study.
– Physically obtaining samples from the containers
The DQO Process:Develop a Decision Rule
• The Parameter of Interest
– The average concentration of cadmium
• Specify the Action Level for the study.
– The RCRA standard for cadmium (1.0 mg/l) in TCLP leachate
• Develop a Decision Rule.
– If the average cadmium concentration in a bin is more than 1.0 mg/l, then the ash will be disposed of in a RCRA facility.
– If the average cadmium concentration in a bin is less than 1.0 mg/l, then the ash will be disposed of in a municipal landfill.
The DQO Process:Specify Limits on Decision Errors
• Determine baseline condition
– Null hypothesis = "hazardous" (RCRA requirement) mean > 1.0 mg/l
• Identify decision errors
– False rejection:
Decide mean < 1.0 mg/l when mean > 1.0 mg/l
– False acceptance:
Decide mean > 1.0 mg/l when mean < 1.0 mg/l
• Identify limits on decision errors & gray region
The DQO Process:Tolerable Limits of Decision Error
Decision Performance for Cadmium Compliance TestingBaseline Condition: Ash is hazardous, mean > 1
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Tole
rabl
e C
hanc
e of
Dec
larin
g N
onco
mpl
ianc
e
True Mean Concentration of Cadmium (mg/l)
Tolerable False
AcceptanceDecision
Error Rates
Action Level
Tolerable False
RejectionDecision
Error Rates
Gray Region
Optimize the Design
• Develop general data collection design alternatives
– Simple random sampling
– Simple random sampling with compositing
– Sequential random sampling
• For each design, develop cost formula, select a proposed method of data analysis, develop method for estimating sample size to correspond to method for data analysis
• Select the most resource-effective design
– consider cost, human resources, other constraints
– consider performance of design
The DQO Process:Optimize the Design
Elements of the Design:
• Hypothesis Test• Statistical Model• Design Description/Option• Sample Location• Sample Cost• Sample Size• Design Performance
Design Options:Simple Random Sampling
• Simple Random Sample
– Simplest type of probability sampling– Every point in the sampling medium has an equal
chance of being selected.
• Application
– Small variance– Inexpensive sampling and analysis
Design Options:Composite Sampling
• Physically combining multiple samples then drawing one or more sub-samples for analysis
• Application:
– When an average concentration is sought and there is no need to detect peak concentrations
– Large variance (allows the researchers to sample a larger number of locations)
– Reduces total cost when analytical costs are higher than sample collection costs
Design Options:Sequential Sampling
• Conduct several rounds of sampling and analysis; perform statistical test between each round to make one of three decisions:
–Accept null hypothesis –Reject null hypothesis–Collect more samples
• Application
–When sampling and analysis costs are high–When information about sampling or measurement
variability is lacking–When the waste is stable over time frame of the sampling
effort
Sample/Analysis/Disposal Costs
• Sample collection costs from each container- $10/sample
• TCLP cost - $150/analysis
• 15 tons of ash per container
• $500/ton RCRA landfill ($7,500 per container)
• $50/ton municipal landfill ($750 per container)
Decision Performance Goal Diagramwith Performance Curve:Simple Random Sampling
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of samples 37Cost of Data Collection Design $5,920
Tole
rabl
e C
hanc
e of
Dec
idin
g th
at t
heP
aram
eter
Exc
eeds
the
Act
ion
Leve
l
True Value of the Parameter (Mean Concentration, mg/l)
Tolerable False
AcceptanceDecision
Error Rates
Gray Region
Tolerable False
RejectionDecision
Error Rates
PERFORMANCE CURVE
Baseline Condition: Ash is hazardous, mean > 1
Decision Performance Goal Diagramwith Performance Curve:
Relaxed Decision Error Constraints
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of samples 20Cost of Data Collection Design $3,200
Tole
rabl
e C
hanc
e of
Dec
idin
g th
at t
heP
aram
eter
Exc
eeds
the
Act
ion
Leve
l
True Value of the Parameter (Mean Concentration, mg/l)
Tolerable False
AcceptanceDecision
Error Rates
Gray Region
Tolerable False
RejectionDecision
Error Rates
PERFORMANCE CURVE
Baseline Condition: Ash is hazardous, mean > 1
Decision Performance Goal Diagramwith Performance Curve:
Increased Gray Region Width
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of samples 13Cost of Data Collection Design $2,080
Tole
rabl
e C
han
ce o
f Dec
idin
g th
at t
heP
ara
me
ter
Exc
eeds
the
Act
ion
Lev
el
True Value of the Parameter (Mean Concentration, mg/l)
Tolerable False
AcceptanceDecision
Error Rates
Gray Region
PERFORMANCE CURVE
Tolerable False
PositiveRejection
Error Rates
Baseline Condition: Ash is hazardous, mean > 1
Decision Performance Goal Diagramwith Performance Curve:
Simple Random Sampling with Compositing
Number of samples 64Number of Analyses 16Cost of Data Collection Design $3,040
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Tole
rabl
e C
hanc
e o
f De
cidi
ng
that
the
Par
amet
er
Exc
eed
s th
e A
ctio
n L
eve
l
True Value of the Parameter (Mean Concentration, mg/l)
Tolerable False
AcceptanceDecision
Error Rates
Gray Region
Tolerable False
RejectionDecision
Error Rates
PERFORMANCE CURVE
Baseline Condition: Ash is hazardous, mean > 1
Compare Overall Efficiency
*Simple Random Sampling $5920
Simple Random Sampling with $3200
Relaxed Decision Error Constraints
Simple Random Sampling with $2080
Increased Gray Region Width
*Simple Random Sampling with $3040
Compositing
Budget = $3,000
* Used original Decision Error Limits
Contamination of Tarheel County's Sole Drinking Water Source/System
Drinking Water Problem
Week 1 Quarterly monitoring of drinking water did not detectany contaminants above drinking water standards.
Week 2 Groundwater is the drinking water source for Tarheel County. Atrazine was discovered in surface waters (that are hydraulically connected to groundwater) at level up to 500 ppb, which is well above the maximum
contaminant level (MCL) of 3 ppb.
Week 3 Source of contamination has not been identified.
Week 4 - Citizens are concerned about threat to public health Present and demand that State and Local official ensure that
water is safe to drink.
Tarheel County Water Supply System
• 6 wells in wellfield• Water company operates water system• System capacity: 8.6 million gallons/day (MGD)• System demand: 3-5 MGD• System serves 25,000 residents• Minimal Treatment (chlorination only)• Centralized above-ground storage holds water from all
wells• Capacity is nearly 10 gallons to ensure 4-hour
residence time for chlorination
Tarheel CountyWater Supply System
Assignment:
Decide whether the level of atrazine in
drinking water exceeds the MCL and
requires corrective action.
Data Quality Objectives Decision Error Feasibility
Trials Software (DQO/DEFT)
The Purpose of DEFT
• DEFT determines the feasibility of DQOs based on sample size and cost for several sampling designs
• DQOs are feasible if at least one sampling design can satisfy the DQOs (decision error limits, cost constraints, time limitations, etc.).
Uses of DEFT
• Aids in iterations between steps 6 and 7 of the DQO process
• That is, it provides a smooth transition between the specific DQOs and the development of a data collection design
• As a learning tool, facilitates understanding and communication
What DEFT Cannot Do
DEFT should not be used to decide on a final data collection design or sample size.
It cannot account for differences between:
• Media• Contaminants• Spatial boundaries• Temporal boundaries
How DEFT Works
• Utilizes outputs of the DQO process
• Evaluates several basic collection designs
• Estimates the number of samples
• Estimates costs of data collection designs
What DQO Outputs are Necessaryas DEFT Inputs?
• Limits on decision errors
• Action level
• Possible range of parameter (minimum, maximum)
• Cost of sample collection and analysis per sample
• Location and width of gray region
• Estimated standard deviation
• Null hypothesis (H0)
Analysis of DEFT
Allows user to:
• Determine effect or change DQOs• View Decision Performance Goal Diagram• Change sampling design
– Simple Random Sampling– Composite Random Sampling– Stratified Random Sampling
• Set sample size• Save DQOs, design information, and decision
performance goal diagram to a file
DECISION PERFORMANCE GOAL DIAGRAM
Simple Random Sampling conc. prob. typeSample Size - 37 0.25 0.100 F(a)Cost - $5920.00 0.75 0.200 F(a)Action Level - 1.00 1.00 0.050 F(r) 1.50 0.010 F(r) Press any key to return to input screen.
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Pro
babi
lity
of D
ecid
ing
tha
t th
eM
ean
Exc
eeds
the
Act
ion
Leve
l
Concentration
F(a)
Action Level
F(r)
Gray Region
DEFT in the Project Life Cycle
DEFT DEFT
Planning Implementing Assessing
Beyond the DQO Process
The Project Life Cycle
Planning
Implementation
Assessment
Make the Decision
Conduct Data QualityAssessment
Plan for Data CollectionUsing the DQO Process
Collect Environmental Data Using Documented Sampling
Schemes
EPA QA/G-4
EPA QA/G-5
EPA QA/G-9
What Is A QA Project Plan?
• Mandatory planning document
• Part of mandatory Agency-wide Quality System
• Description of how data will be collected, assessed, and analyzed
• Project Blueprint - who, what, where, when, why
• Living document that is revised to reflect significant changes
QA Project Plans (QAPPs)
• QAPPs must be approved prior to the start of data collection
• QAPPs are required when environmental data operations occur in:
– Intramural projects– Contracts, work assignments, delivery orders– Grants, cooperative agreements– Interagency agreements (when negotiated)– State-EPA agreements– Responses to statutory or regulatory requirements
and to consent agreements
What Does A QA Project Plan Do For You?
When you are asked:
− "What did you do?"
− "How did you do it?"
− "Why did you do it?"
− "Did you do it correctly?"
The QA Project Plan has the answer.
Elements of a QA Project Plan
Group A. Project Management
Group B. Data Generation and Acquisition
Group C. Assessment and Oversight
Group D. Data Validation and Usability
Group A: Project Management Element
1. Title and Approval Sheet2. Table of Contents3. Distribution List4. Project/Task Organization5. Problem Definition/Background6. Project/Task Description7. Quality Objectives and Criteria8. Special Training Requirements/Certification9. Documentation and Records
Group B: Data Generation & Acquisition Elements
1. Sampling Process Design (Experimental Design)2. Sampling Methods Requirements3. Sample Handling and Custody Requirements 4. Analytical Methods Requirements5. Quality Control Requirements6. Instrument/Equipment Testing, Inspection, and
Maintenance Requirements7. Instrument Calibration and Frequency8. Inspection/Acceptance Requirements for Supplies
and Consumables9. Data Acquisition Requirements (Non-Direct
Measurements10. Data Management
Elements in Group C & Group D
Group C: Assessment & Oversight Elements 1. Assessments and Response Actions2. Reports to Management
Group D: Data Validation & Usability Elements1. Data Review, Validation, and Verification Requirements2. Validation and Verification Methods 3. Reconciliation with User Requirements
Data Quality Assessment (DQA)
• A process to determine if data are adequate for their intended use– scientific and statistical evaluation– determine if data are of the right type, quality, and
quantity
• Sample data are used to make decisions during DQA
• Does data provide "sufficient evidence" to draw conclusions?
Data Quality
• Data quality is meaningful only when "data quality" relates to intended use of data
• Some data are of adequate quality for some purposes but not for others
• Need to determine if the data are of the right type, quality, and quantity for their intended use
Data Quality Assessment Can
Answer:– Do the data violate the conceptual site model or
test assumptions?– Did I collect enough data? – What is my conclusion?
Can Not Answer:– Did I make a decision error?
(good decision -- bad outcome)– What are the "true" conditions?– Do I need different types of data?
Data Quality Assessment Can
Decision maker's contribution:
− Inspection of data for scientific anomalies
− Responsibility for transcription errors
− Assessment of effect of QA and QC deviations
− Professional contextual judgment
DQA is a Joint Effort
Statistician's contribution:
− Graphical display of data and trends
− Statistical analysis required by the DQO
− Investigation of assumption violations
− Identification of potential outliers
− Providing direction for data improvement
The 5 Steps of
Data Quality Assessment 1. Review the DQOs and Sampling Design
2. Conduct a Preliminary Data Review
3. Select the Statistical Test
4. Verify the Assumptions of the Statistical Test
5. Draw Conclusions from the Data
Guidance for Data Quality Assessment: Practical Methods for Data Analysis (G-9)
• Written for non-statisticians
• Supplements Agency guidance
• Does not replace statistical texts
• Regular supplements– Current examples– Shared information
DataQUEST
• A PC-based software package that performs baseline Data Quality Assessment
• Provides simple tools to a wide audience
• Implements statistical methods described in guidance (G-9)
• Supplements guidance so description of statistical tools is not contained in the User's Guide
Advantages
• Menu-based System - no special language or commands like statistical packages
• Does not treat data as discreet numbers in graphs like spreadsheets
• More standards statistical graphs than spreadsheets
QA Guidancewww.epa.gov/quality
Guidance for the Data Quality Objectives Process (G-4)− Planning process that ties data collection designs to user defined
decision error tolerances
Guidance for QA Project Plans (G-5)− Utilizes outputs of DQO Process for detailing data collection
operations, the "blue-print" of data collection
Guidance for Data Quality Assessment (G-9)− Assessment of data to establish if they meet user-defined decision
error limits